In the field of speech recognition, the development of technology for recognizing Chinese has always attracted significant attention. Recently, the FireRed team from Xiaohongshu released a brand new open-source speech recognition model — FireRedASR. This large-model-based speech recognition system has achieved outstanding results on multiple standard test sets, marking a major breakthrough in Chinese speech recognition technology.

144649d6-0590-4a91-b080-0db2afdf54a9.png

The core metric of FireRedASR is the Character Error Rate (CER); the lower the CER, the better the model's recognition performance. In recent public tests, FireRedASR achieved a CER of 3.05%, an 8.4% reduction compared to the previous best model, Seed-ASR. This result demonstrates the innovative capabilities of the FireRed team in speech recognition technology.

The FireRedASR model consists of two core structures: FireRedASR-LLM and FireRedASR-AED. The former focuses on achieving the highest accuracy in speech recognition, while the latter strikes a good balance between accuracy and inference efficiency. The team has provided models of various sizes and inference codes to meet the needs of different application scenarios.

In multiple everyday application scenarios, FireRedASR has also demonstrated strong performance. In a test set composed of various sources, including short videos, live broadcasts, and voice input, FireRedASR-LLM reduced the CER by 23.7% to 40% compared to leading service providers in the industry. Particularly in scenarios requiring lyric recognition, the model performed exceptionally well, achieving a relative reduction of 50.2% to 66.7% in CER.

Furthermore, FireRedASR has excelled in scenarios involving Chinese dialects and English, with its CER significantly outperforming previous open-source models on the KeSpeech and LibriSpeech test sets, proving its robustness and adaptability in various language environments.

The FireRed team hopes to promote the development and application of speech recognition technology through the open-sourcing of this new model, contributing to the future of voice interaction. All models and code have been made public on GitHub, encouraging more developers and researchers to participate.

huggingface: https://huggingface.co/FireRedTeam

github: https://github.com/FireRedTeam/FireRedASR

Highlights:

- 🎤 FireRedASR is the newly released open-source speech recognition model from the Xiaohongshu team, with excellent accuracy in recognizing Chinese.

- 🚀 The model is divided into FireRedASR-LLM and FireRedASR-AED, catering to accuracy and efficiency needs respectively.

- 🌍 FireRedASR performs excellently in various scenarios, suitable for Mandarin, Chinese dialects, and English among other language environments.