TangoFlux
An efficient text-to-audio generation model
CommonProductMusicText-to-audioAudio generation
TangoFlux is an efficient text-to-audio (TTA) generation model with 515M parameters, capable of generating up to 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU. The model introduces the CLAP-Ranked Preference Optimization (CRPO) framework to address the alignment challenges of TTA models, enhancing TTA alignment through iterative generation and optimization of preference data. TangoFlux achieves state-of-the-art performance in both objective and subjective benchmark tests, and all code and models are open-source to support further research in TTA generation.
TangoFlux Visit Over Time
Monthly Visits
4420
Bounce Rate
49.52%
Page per Visit
1.1
Visit Duration
00:00:00