UniMuMo
Unified model for text, music, and motion generation.
CommonProductMusicArtificial IntelligenceMachine Learning
UniMuMo is a multimodal model capable of taking any text, music, and motion data as input conditions to generate outputs across all three modalities. The model bridges these modalities by converting music, motion, and text into token-based representations through a unified encoder-decoder architecture. By fine-tuning existing pretrained unimodal models, it significantly reduces computational requirements. UniMuMo has achieved competitive results in all unidirectional generation benchmarks across music, motion, and text modalities.
UniMuMo Visit Over Time
Monthly Visits
193
Bounce Rate
49.88%
Page per Visit
1.0
Visit Duration
00:00:00