Make-An-Audio 2

Text-to-audio generation technology based on diffusion models

CommonProductOthersText-to-audioDiffusion models
Make-An-Audio 2 is a text-to-audio generation technology based on diffusion models, co-developed by researchers from Zhejiang University, ByteDance, and the Chinese University of Hong Kong. This technology utilizes pre-trained large language models (LLMs) to parse text, optimizing for semantic alignment and temporal consistency, thereby improving the quality of generated audio. It also incorporates a feed-forward Transformer-based diffusion denoiser to enhance performance in generating variable-length audio and bolster the extraction of temporal information. Furthermore, by leveraging LLMs to convert abundant audio label data into audio-text datasets, the issue of time data scarcity is addressed.
Visit

Make-An-Audio 2 Visit Over Time

Monthly Visits

166

Bounce Rate

47.04%

Page per Visit

1.0

Visit Duration

00:00:00

Make-An-Audio 2 Visit Trend

Make-An-Audio 2 Visit Geography

Make-An-Audio 2 Traffic Sources

Make-An-Audio 2 Alternatives