TangoFlux

An efficient text-to-audio generation model

CommonProductMusicText-to-audioAudio generation
TangoFlux is an efficient text-to-audio (TTA) generation model with 515M parameters, capable of generating up to 30 seconds of 44.1kHz audio in just 3.7 seconds on a single A40 GPU. The model introduces the CLAP-Ranked Preference Optimization (CRPO) framework to address the alignment challenges of TTA models, enhancing TTA alignment through iterative generation and optimization of preference data. TangoFlux achieves state-of-the-art performance in both objective and subjective benchmark tests, and all code and models are open-source to support further research in TTA generation.
Visit

TangoFlux Visit Over Time

Monthly Visits

4420

Bounce Rate

49.52%

Page per Visit

1.1

Visit Duration

00:00:00

TangoFlux Visit Trend

TangoFlux Visit Geography

TangoFlux Traffic Sources

TangoFlux Alternatives