Fish Agent V0.1 3B

High-precision speech-to-speech model for capturing and generating environmental audio information.

CommonProductProductivitySpeech-to-SpeechText-to-Speech
Fish Agent V0.1 3B is a groundbreaking speech-to-speech model capable of capturing and generating environmental audio information with unprecedented accuracy. The model utilizes a non-semantic tagging architecture, eliminating the need for traditional semantic encoders/decoders. Additionally, it is a cutting-edge text-to-speech (TTS) model trained on 700,000 hours of multilingual audio content. As a continuation of the Qwen-2.5-3B-Instruct pre-trained version, it has been trained on 200 billion speech and text tags. The model supports eight languages, including English and Chinese, with approximately 300,000 hours of training data for each of these languages and around 20,000 hours for others.
Visit

Fish Agent V0.1 3B Visit Over Time

Monthly Visits

19075321

Bounce Rate

45.07%

Page per Visit

5.5

Visit Duration

00:05:32

Fish Agent V0.1 3B Visit Trend

Fish Agent V0.1 3B Visit Geography

Fish Agent V0.1 3B Traffic Sources

Fish Agent V0.1 3B Alternatives