Meta has open-sourced the world's largest multimodal translation model, SeamlessM4T, which supports 100 languages and can recognize local dialects. This model can perform multimodal translation tasks including speech-to-text, speech-to-speech, text-to-speech, and text-to-text. SeamlessM4T integrates previous translation models released by Meta, such as NLLB and MMS, and has been trained using a large amount of aligned speech and text data. The model has achieved advanced results in multitask translation and has demonstrated excellent performance in robustness testing, particularly in recognizing background noise and speaker variation. Additionally, this model significantly improves the performance of low-resource languages.