EMOVA
Emotionally Rich Multimodal Language Model
CommonProductOthersMultimodalSpeech Recognition
EMOVA (Emotionally Omni-present Voice Assistant) is a multimodal language model capable of end-to-end speech processing while maintaining state-of-the-art visual-language performance. The model achieves emotionally rich multimodal dialogue through a semantically-acoustic decoupled speech tokenizer and has reached cutting-edge performance in visual-language and speech benchmarking tests.
EMOVA Visit Over Time
Monthly Visits
2199
Bounce Rate
0.30%
Page per Visit
2.0
Visit Duration
00:07:14