MooER: The Open-Source Audio Understanding Model by Moore Threads

AIbase基地

Published inAI News · 4 min read · Aug 26, 2024

304

Moore Thread recently announced the open-source release of its audio understanding large model, MooER (Moor), becoming the industry's first large-scale open-source speech model trained and inferred on a domestically produced full-function GPU. MooER not only supports Chinese and English speech recognition but also possesses the capability to translate speech from Chinese to English, showcasing robust multilingual processing abilities.

MooER employs an innovative three-part model structure, including Encoder, Adapter, and Decoder (Large Language Model, LLM). This design allows the model to effectively process raw audio, extract features, and perform downstream tasks such as speech recognition and translation. The project team has open-sourced the inference code and the model trained on 5,000 hours of data, with plans to further open-source the training code and an enhanced model trained on 80,000 hours of data.

In comparative tests with several well-known open-source audio understanding large models, MooER-5K performed excellently. In Chinese tests, its Character Error Rate (CER) reached 4.21%; in English tests, the Word Error Rate (WER) was 17.98%, outperforming or matching other top models. Notably, on the Covost2zh2en Chinese-to-English test set, MooER's BLEU score was as high as 25.2, significantly leading other open-source models, reaching a level comparable to industrial applications.

Even more promising is the MooER-80k model trained on 80,000 hours of data, which demonstrated even stronger performance, with the CER on the Chinese test set further reduced to 3.50%, and the WER on the English test set optimized to 12.66%, showing significant developmental potential.

Moore Thread's open-source release of MooER not only showcases the application strength of domestic GPUs in the AI field but also injects new vitality into the global development of audio AI technology. With more training data and code being open-sourced, the industry looks forward to MooER bringing more breakthrough advancements in speech recognition, translation, and other areas, driving the popularization and innovative applications of audio AI technology.

Address: https://arxiv.org/pdf/2408.05101

MooER Audio Understanding Speech Recognition Multilingual Processing

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

Recently, the French AI laboratory Kyutai announced the official open source of its new text-to-speech model, Kyutai TTS, providing global developers and researchers with a high-performance, low-latency speech synthesis solution. This breakthrough release not only promotes the development of open-source AI technology but also opens up new possibilities for multilingual voice interaction applications. AIbase provides an exclusive analysis of this technological highlight and its potential impact. Ultra-low latency, a new experience in real-time interaction. Kyutai TTS has become an industry standout with its exceptional performance.

Jul 4, 2025

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the issue of "reward hacking." These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies.

Jul 4, 2025

Shortcut Makes Its Debut! AI Excel Assistant Surpasses Human Champions by 10 Times, Task Automation Efficiency Soars

Recently, an AI Excel assistant called Shortcut has sparked heated discussions on social media. It enables users to effortlessly complete Excel tasks without writing complex formulas or VBA code through natural language processing (NLP) technology. The AIbase editorial team has compiled the latest information from social media to provide an in-depth analysis of Shortcut's powerful features and its potential impact on the fields of data processing and financial modeling. Shortcut: An Excel Revolution Driven by Natural Language

Jul 3, 2025

4.9k

Stability AI Opensources Stable Audio Open Small, Turning Your Phone into an Audio Creation Wizard

Jul 3, 2025

140

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

Jul 2, 2025

600

AI Daily: Baidu Launches Drawn-Imagine Platform and MuseSteamer; Alibaba's Audio-Driven Full-Body Digital Human Model OmniAvatar

Welcome to the [AI Daily] section! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and learn about innovative AI product applications. Click to learn more about new AI products: https://top.aibase.com/1、Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand audio and directly generate natural speech. Step-Audio-AQAA is an open source end-to-end speech large model,

Jul 2, 2025

660

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

Jul 2, 2025

480

Zhejiang University and Alibaba jointly launch OmniAvatar: A full-body digital human model driven by audio makes a stunning debut

Zhejiang University and Alibaba have jointly launched the new audio-driven model OmniAvatar, marking a new height in digital human technology. This model is driven by audio and can generate natural and smooth full-body digital human videos, especially showing outstanding performance in singing scenarios, with mouth movements and audio lip synchronization being precise and realistic. OmniAvatar supports fine control of generation details through text prompts, allowing users to customize the range of character movements, background environment, and emotional expressions, demonstrating a high level of flexibility. In addition, this model can generate virtual characters interacting with objects

Jul 2, 2025

340

The Revolution of Large Models! How Gemini 2.5 Pro is Transforming the Way We Process Information

Jul 1, 2025

260

AI Daily: Alibaba Tongyi Launches Qwen-TTS Model; Cursor Now Supports Web and Mobile; ByteDance Unveils Image Synthesis Technology XVerse

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Discover new AI products: https://top.aibase.com/1. Qwen-TTS Launches with a Major Breakthrough in Dialect Speech Synthesis, Achieving Realism Close to Human Voices. The Qwen-TTS model, developed by Alibaba's Tongyi team, has made significant breakthroughs in the field of speech synthesis.

Jul 1, 2025

370

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

MooER: The Open-Source Audio Understanding Model by Moore Threads

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

Shortcut Makes Its Debut! AI Excel Assistant Surpasses Human Champions by 10 Times, Task Automation Efficiency Soars

Stability AI Opensources Stable Audio Open Small, Turning Your Phone into an Audio Creation Wizard

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

AI Daily: Baidu Launches Drawn-Imagine Platform and MuseSteamer; Alibaba's Audio-Driven Full-Body Digital Human Model OmniAvatar

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

Zhejiang University and Alibaba jointly launch OmniAvatar: A full-body digital human model driven by audio makes a stunning debut

The Revolution of Large Models! How Gemini 2.5 Pro is Transforming the Way We Process Information

AI Daily: Alibaba Tongyi Launches Qwen-TTS Model; Cursor Now Supports Web and Mobile; ByteDance Unveils Image Synthesis Technology XVerse