Fish Audio recently dropped a bombshell - Fish Speech 1.5, a brand new speech synthesis model that truly brings sound to life. It not only surpasses its predecessors in accuracy, stability, and cross-language capabilities, but it also adds support for five new languages all at once! Additionally, Fish Speech 1.5 will soon launch a real-time seamless conversation feature, allowing users to choose from a voice library for interactive chatting anytime, anywhere.

image.png

Fish Speech 1.5 is quite knowledgeable; it has "digested" over 1 million hours of multilingual training data to hone its skills, and it is now proficient in 13 languages, including English, Chinese, and Japanese. This is not just talk; it has even achieved second place in the anonymous TTS-Arena ranking!

The voice cloning feature of Fish Speech 1.5 is nothing short of "The Flash," with a delay of less than 150 milliseconds, making it nearly real-time! More importantly, Fish Speech 1.5 generously open-sourced its pre-trained model, so whether you want to "train" it at home or opt for cloud services, it's all easy to manage!

Main Features:

Zero-shot and few-shot speech synthesis: Just provide it with a 10 to 30-second audio sample, and it can imitate remarkably, generating high-quality speech synthesis output. It's like a super imitation show; as long as you're willing to "show," it’s ready to "learn"!

Multilingual and cross-language support: Still worried about language barriers? Fish Speech 1.5 has cleared the obstacles for you! Just copy and paste what you want to say into the input box, and it can handle it easily, currently supporting English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish. Now, you can finally chat freely with friends from around the world!

No phoneme dependency: Traditional speech synthesis models often rely on phonemes, while Fish Speech 1.5 takes a different approach. It possesses strong generalization capabilities, able to handle text from any language script, making it a revolution in the field of speech synthesis!

Highly accurate: For a 5-minute English article, Fish Speech 1.5's error rate is as low as 2%, which is quite an impressive figure!

Fast: The speed of Fish Speech 1.5 is remarkable; on an Nvidia RTX 4060 laptop, its real-time coefficient is about 1:5, and on an Nvidia RTX 4090, it can reach as high as 1:15! It truly feels "like flying"!

Fish Speech 1.5 also supports local deployment:

WebUI: It provides a simple and user-friendly Web UI that is compatible with major browsers like Chrome, Firefox, and Edge, allowing you to experience the joy of speech synthesis anytime, anywhere.

GUI: It also offers a PyQt6 graphical interface that seamlessly collaborates with the API server, supporting Linux, Windows, and macOS systems, making it a blessing for the "Three Musketeers"!

Deployment-friendly: You can easily deploy Fish Speech 1.5 on Linux, Windows, and macOS systems, minimizing speed loss.

Official website: https://fish.audio/zh-CN/

Project address: https://github.com/fishaudio/fish-speech