Make-An-Audio 2
Text-to-audio generation technology based on diffusion models
CommonProductOthersText-to-audioDiffusion models
Make-An-Audio 2 is a text-to-audio generation technology based on diffusion models, co-developed by researchers from Zhejiang University, ByteDance, and the Chinese University of Hong Kong. This technology utilizes pre-trained large language models (LLMs) to parse text, optimizing for semantic alignment and temporal consistency, thereby improving the quality of generated audio. It also incorporates a feed-forward Transformer-based diffusion denoiser to enhance performance in generating variable-length audio and bolster the extraction of temporal information. Furthermore, by leveraging LLMs to convert abundant audio label data into audio-text datasets, the issue of time data scarcity is addressed.
Make-An-Audio 2 Visit Over Time
Monthly Visits
320
Bounce Rate
54.91%
Page per Visit
1.0
Visit Duration
00:00:00