Moore Thread recently announced the open-source release of its audio understanding large model, MooER (Moor), becoming the industry's first large-scale open-source speech model trained and inferred on a domestically produced full-function GPU. MooER not only supports Chinese and English speech recognition but also possesses the capability to translate speech from Chinese to English, showcasing robust multilingual processing abilities.

MooER employs an innovative three-part model structure, including Encoder, Adapter, and Decoder (Large Language Model, LLM). This design allows the model to effectively process raw audio, extract features, and perform downstream tasks such as speech recognition and translation. The project team has open-sourced the inference code and the model trained on 5,000 hours of data, with plans to further open-source the training code and an enhanced model trained on 80,000 hours of data.

QQ20240826-143012.png

In comparative tests with several well-known open-source audio understanding large models, MooER-5K performed excellently. In Chinese tests, its Character Error Rate (CER) reached 4.21%; in English tests, the Word Error Rate (WER) was 17.98%, outperforming or matching other top models. Notably, on the Covost2zh2en Chinese-to-English test set, MooER's BLEU score was as high as 25.2, significantly leading other open-source models, reaching a level comparable to industrial applications.

Even more promising is the MooER-80k model trained on 80,000 hours of data, which demonstrated even stronger performance, with the CER on the Chinese test set further reduced to 3.50%, and the WER on the English test set optimized to 12.66%, showing significant developmental potential.

Moore Thread's open-source release of MooER not only showcases the application strength of domestic GPUs in the AI field but also injects new vitality into the global development of audio AI technology. With more training data and code being open-sourced, the industry looks forward to MooER bringing more breakthrough advancements in speech recognition, translation, and other areas, driving the popularization and innovative applications of audio AI technology.

Address: https://arxiv.org/pdf/2408.05101