EMOVA
Emotionally Rich Multimodal Language Model
CommonProductOthersMultimodalSpeech Recognition
EMOVA (Emotionally Omni-present Voice Assistant) is a multimodal language model capable of end-to-end speech processing while maintaining state-of-the-art visual-language performance. The model achieves emotionally rich multimodal dialogue through a semantically-acoustic decoupled speech tokenizer and has reached cutting-edge performance in visual-language and speech benchmarking tests.
EMOVA Visit Over Time
Monthly Visits
1153
Bounce Rate
59.77%
Page per Visit
1.0
Visit Duration
00:00:00