Recently, Moore Threads announced the official open-source release of its audio understanding large model—MooER (MooEar), the industry's first large-scale open-source speech model trained and inferred on a domestically produced full-featured GPU, showcasing the latest achievements of Moore Threads in the field of artificial intelligence.

The MooER large model completed the training of 5,000 hours of audio data and pseudo-labels in just 38 hours on Moore Threads' KUAE intelligent computing platform. This achievement is attributed to the combination of the company's proprietary innovative algorithms and efficient computing resources. MooER not only supports speech recognition in both Chinese and English but also possesses the capability of voice translation from Chinese to English, demonstrating outstanding performance in multiple speech recognition test sets. Notably, in the Covost2 Chinese-to-English test set, MooER-5K achieved a BLEU score of 25.2, approaching industrial-level effectiveness.

WeChat Screenshot_20240826083635.png

Moore Threads' AI team has open-sourced the inference code and the model trained on 5,000 hours of data, with plans to further open-source the training code and the model trained on 80,000 hours of data. The MooER model structure includes three parts: Encoder, Adapter, and Decoder, using the open-source Paraformer speech encoder and Qwen2-7B-instruct large language model to initialize the Encoder and LLM modules.

In technical comparisons, MooER-5K outperforms other open-source models in both Chinese and English test sets. Through this open-source project, Moore Threads provides valuable references and support for developers with limited data and computing resources.

GitHub:https://github.com/MooreThreads/MooER