Recently, ByteDance released a new tool for music creation called Seed-Music. This magical music generation model allows you to easily create music through various input methods such as text descriptions, audio references, sheet music, or even voice prompts, making it feel like having a musical wizard at your fingertips! 

Seed-Music combines autoregressive language models and diffusion models, not only generating high-quality music pieces but also allowing precise control over musical details. Whether you want to create lyrics with accompanying music or adapt melodies, it's all possible. Even more, you can upload a short voice clip, and the system will automatically convert it into a complete song, convenient and efficient.

The powerful Seed-Music supports the generation of vocals and instrumental music, including singing synthesis, voice conversion, and music editing, catering to the needs of different users. You can generate pop music through simple text descriptions or adjust music styles through audio cues, truly refreshing.

More interestingly, Seed-Music's architecture is divided into three modules: representation learning, generation, and rendering. These modules work together like a band, generating high-quality music through multimodal inputs.

image.png

The representation learning module compresses raw audio signals into three intermediate representations suitable for different music generation and editing tasks. The generation module transforms user inputs into musical representations using autoregressive and diffusion models. The final rendering module converts these intermediate representations into high-quality audio for your enjoyment.

To ensure music quality, Seed-Music employs various technologies: autoregressive language models generate audio symbols step-by-step, diffusion models clarify music through denoising, and vocoders translate these musical "codes" into high-fidelity playable sounds.

The training process of Seed-Music is also intriguing, divided into pretraining, fine-tuning, and post-training phases. Through extensive music data, the model acquires basic capabilities, improves specific task performance through fine-tuning, and continuously optimizes generation results through reinforcement learning.

Project Link: https://team.doubao.com/en/special/seed-music