SeamlessM4T: Multimodal Speech and Text Translation

Facebook has announced a new library:

Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages. SeamlessM4T supports:

Speech recognition for nearly 100 languages

Speech-to-text translation for nearly 100 input and output languages

Speech-to-speech translation, supporting nearly 100 input languages and 36 (including English) output languages

Text-to-text translation for nearly 100 languages

Text-to-speech translation, supporting nearly 100 input languages and 35 (including English) output languages

The open source library is available on GitHub and you can also get the model itself on HuggingFace. The nicest thing about all of this is that, unlike existing translation services, you can run it entirely offline and perform the inference on local compute.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31