Meta has recently introduced a new series of voice translation models called Seamless Communication, which includes four models that support real-time voice translation between nearly 100 languages with a latency of around 2 seconds. These models can replicate complex features of the source speech such as pauses, intonation, and speaking speed, making the translations more lifelike. They utilize a non-autoregressive architecture to facilitate long sequence translation. Additionally, Meta has open-sourced the models along with the largest speech corpus to date, comprising 585,000 hours of audio, and has added features like audio watermarking and translation toxicity mitigation to prevent misuse of the models.
Meta Releases New Voice Translation Model Supporting Tone and Speed Imitation
量子位
68
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/4426