Tencent Launches EzAudio AI: Transforming Text into Realistic Audio in Seconds

AIbase基地

Published inAI News · 4 min read · Sep 19, 2024

896

Recently, Johns Hopkins University and Tencent AI Lab have jointly introduced a new text-to-audio generation model called EzAudio. This technology promises unprecedented efficiency and high-quality text-to-voice conversion, marking a significant leap in artificial intelligence and audio technology.

The working principle of EzAudio is to utilize the latent space of audio waveforms rather than traditional spectrograms, which allows it to operate at high temporal resolution without the need for additional neural vocoders.

The architecture of EzAudio is referred to as EzAudio-DiT (Diffusion Transformer), incorporating several technological innovations to enhance performance and efficiency. These include a new adaptive layer normalization technique called AdaLN-SOLA, long skip connections, and advanced positional encoding techniques such as RoPE (Rotary Position Embedding).

Researchers report that the audio samples generated by EzAudio are highly realistic, outperforming existing open-source models in both objective and subjective evaluations.

Currently, the AI audio generation market is rapidly growing. Notable companies like ElevenLabs have recently launched an iOS application for text-to-speech conversion, indicating a strong consumer interest in AI audio tools. Meanwhile, tech giants such as Microsoft and Google are continuously increasing their investments in AI voice simulation technology.

According to Gartner's predictions, by 2027, 40% of generative AI solutions will be multimodal, combining text, image, and audio capabilities. This suggests that high-quality audio generation models like EzAudio may play a significant role in the evolving AI landscape.

The EzAudio team has publicly released their code, dataset, and model checkpoints, emphasizing transparency and encouraging further research in the field.

Researchers believe that the applications of EzAudio may extend beyond sound effects generation, encompassing areas such as voice and music production. With continuous technological advancements, it is expected to find widespread use in industries such as entertainment, media, assisted services, and virtual assistants.

Demo: https://huggingface.co/spaces/OpenSound/EzAudio

Project Entry: https://github.com/haidog-yaqub/EzAudio?tab=readme-ov-file

Key Points:
🌟 EzAudio is a new text-to-audio generation model developed through a collaboration between Johns Hopkins University and Tencent, marking a significant advancement in audio technology.
🎧 The model generates audio samples of superior quality compared to existing open-source models, with broad application potential.
⚖️ As the technology advances, ethical and responsible use issues are becoming more prominent. The public release of EzAudio's research code also provides extensive opportunities for examining future risks and benefits.

EzAudio AI Audio Technology Vocoder

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Kuaishou AI Glasses Partner with Gaode to Enhance Travel Services, Gradually Integrating Features like Street Scanning Rankings and Ride-Hailing

Quark AI glasses and Amap enhance collaboration with new features like 'navigation screen projection', enabling voice or app-initiated navigation for seamless travel experience.....

Nov 20, 2025

100

NetEase Reports Revenue of 28.4 Billion Yuan in the Third Quarter: Sales of AI Subscription Services Reach New Highs

Today, NetEase announced its unaudited financial results for the third quarter of 2025. The data shows that NetEase achieved a revenue of 28.4 billion yuan (approximately 4 billion US dollars) in this quarter, representing an 8.2% increase compared to the same period in 2024, demonstrating a steady growth trend. In terms of gross profit, NetEase achieved 18.2 billion yuan (approximately 2.6 billion US dollars), an increase of 10.3% year-over-year, indicating further improvement in profitability. Regarding cost control, NetEase's total operating expenses amounted to 10.2 billion yuan (1.4 billion US dollars), an increase of 8.9% compared to the same period in 2024.

Nov 20, 2025

100

MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate

The MOSS team from Fudan University released MOSS-Speech, which realizes end-to-end speech dialogue for the first time. The model is now available and open-sourced on Hugging Face. It adopts a 'layer splitting' architecture, freezing the original text model and adding new layers for speech understanding, semantic alignment, and vocoder. It can complete speech Q&A, emotional imitation, and laughter generation in one step, without the traditional three-step process. Evaluation results show that the word error rate has been reduced to 4.1% in the ZeroSpeech2025 task, and the emotion recognition accuracy reached 91.2%.

Nov 20, 2025

110

AI Daily: Meta Opens Source Interactive 3D Model SAM 3D; Lenovo to Launch Personal Super Agent; Warner Music Reaches Copyright Settlement with Udio

Volcano Engine leads China in execution ability and ranks fifth globally in Gartner's AI Platform Magic Quadrant, entering the Challengers quadrant as No. 1 with Doubao model and Volcano Ark platform.....

Nov 20, 2025

110

Lenovo to Launch Personal Super AI Agent, Yang Yuanqing Does Not Believe in an Artificial Intelligence Bubble

Lenovo Group announced the results for its second fiscal quarter of the 2025/26 fiscal year, ending on September 30, 2025. The data shows that Lenovo's revenue increased by 15% year-over-year, reaching 146.4 billion yuan, setting a new record for the quarter; adjusted net profit increased by 25% year-over-year, reaching 3.66 billion yuan, demonstrating strong growth momentum. Looking at the business segments, all of Lenovo's business groups achieved significant growth. Among them, the IDG Smart Devices Business Group generated revenue of 108.1 billion yuan, an increase of 12% year-over-year, and maintained a strong position in the global PC market.

Nov 20, 2025

100

Google Maps Gemini Upgrade: AI Travel Guide + Landmark Navigation + Charging Station Prediction

Google Maps integrates Gemini AI, adding features: pre-trip insights from reviews and web data, landmark-based navigation using street view, available on iOS/Android with Android Auto expansion.....

Nov 20, 2025

Lei Jun Promises to Do Three Things: Promote the Deep Integration of AI and Intelligent Manufacturing

Today, Lei Jun, founder of Xiaomi Group, published a long article through public channels, enthusiastically celebrating the important milestone of the 500,000th complete vehicle being rolled off the production line for Xiaomi cars. In the article, Lei Jun could not hide his excitement, stating: 'Today, we have reached a significant milestone in the development of Xiaomi cars — the 500,000th car has been officially rolled off the production line.' From the release of the first car to the successful roll-out of the 500,000th vehicle, Xiaomi cars have achieved this in just 1 year and 7 months, a remarkably rapid development speed that has attracted attention. Lei Jun candidly admitted that 500,000 units produced is considered a significant achievement even for industry giants.

Nov 20, 2025

140

QQ Browser PC Version v19.8.5 Released: AI + Floating Window Features Fully Upgraded

The PC version of QQ Browser has received a major update with version v19.8.5. This upgrade not only achieved significant breakthroughs in AI features, but also carried out deep optimization in user experience, providing users with an unprecedented smart browsing experience. In terms of menu and function area layout, QQ Browser has made bold reforms. Previously hidden deep within the menus, requiring multiple clicks to find common tools such as bookmarks and history records, have now been moved to the top of the menu, enabling convenient one-click access. Users can also customize according to their personal usage habits.

Nov 20, 2025

110

Musk says the development of AI will make money meaningless, humans won't need to work in the future

On Monday local time, at the important event of the Masa Investment Forum, the chairman of Tesla and SpaceX, Musk, appeared on stage with Huang Renxun, CEO of NVIDIA, and put forward a widely discussed viewpoint. Musk said that the rapid development of generative AI could very likely make money lose its original significance. He further explained that although energy and mass would still be limiting factors, money would eventually become insignificant in the future world. When talking about future work models, Musk showed forward-thinking ideas.

Nov 20, 2025

100

Volc Engine Ranks First in China and Fifth Globally in Gartner's Report on On-Premise Capabilities

Gartner released its first Magic Quadrant for AI Development Platforms, with Volc Engine ranked as the top challenger. It has the fifth strongest on-premise capabilities globally and first in China. Its strengths lie in a complete loop of models, tools, computing power, and scenarios, enabling leading customers in industries such as consumer goods and finance to quickly build multimodal applications. By the first half of 2025, Volc Engine's market share in large model services on public clouds in China reached 49.2%, capturing nearly half of the Chinese market.

Nov 20, 2025

110

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Tencent Launches EzAudio AI: Transforming Text into Realistic Audio in Seconds

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Kuaishou AI Glasses Partner with Gaode to Enhance Travel Services, Gradually Integrating Features like Street Scanning Rankings and Ride-Hailing

NetEase Reports Revenue of 28.4 Billion Yuan in the Third Quarter: Sales of AI Subscription Services Reach New Highs

MOSS-Speech Open Source: China's First Speech-to-Speech Large Model, Bypassing Text Intermediate

AI Daily: Meta Opens Source Interactive 3D Model SAM 3D; Lenovo to Launch Personal Super Agent; Warner Music Reaches Copyright Settlement with Udio

Lenovo to Launch Personal Super AI Agent, Yang Yuanqing Does Not Believe in an Artificial Intelligence Bubble

Google Maps Gemini Upgrade: AI Travel Guide + Landmark Navigation + Charging Station Prediction

Lei Jun Promises to Do Three Things: Promote the Deep Integration of AI and Intelligent Manufacturing

QQ Browser PC Version v19.8.5 Released: AI + Floating Window Features Fully Upgraded

Musk says the development of AI will make money meaningless, humans won't need to work in the future

Volc Engine Ranks First in China and Fifth Globally in Gartner's Report on On-Premise Capabilities

GEO Services