MelodyFlow
High-fidelity text-guided music generation and editing model
PremiumNewProductMusicMusic GenerationText-guided
MelodyFlow is a high-fidelity music generation and editing model based on text control. It utilizes continuous latent representation sequences to avoid information loss associated with discrete representations. Built on a diffusion transformer architecture and trained with flow matching objectives, the model can generate and edit a diverse range of high-quality stereo samples while maintaining the simplicity of text descriptions. MelodyFlow also explores a novel regularized latent inversion method for text-guided editing in zero-shot testing, demonstrating its superior performance across various music editing prompts. The model has been evaluated using objective and subjective metrics, confirming that it matches the quality and efficiency of established benchmarks in standard text-to-music evaluations while surpassing previous state-of-the-art techniques in music editing.