JASCO

Music generation model that combines text and audio conditioning.

CommonProductMusicMusic GenerationText-to-Music
JASCO is a text-to-music generation model that combines symbolic and audio-based conditioning. It can generate high-quality music samples based on global text descriptions and fine-grained local controls. Built upon the stream matching modeling paradigm and a novel conditioning method, JASCO allows music generation to be controlled simultaneously by both local (e.g., chord) and global (text description) cues. By utilizing information bottleneck layers and temporal blurring, it extracts information relevant to specific controls, enabling the combination of symbolic and audio-based conditioning within the same text-to-music model.
Visit

JASCO Alternatives