EMOVA

Emotionally Rich Multimodal Language Model

CommonProductOthersMultimodalSpeech Recognition
EMOVA (Emotionally Omni-present Voice Assistant) is a multimodal language model capable of end-to-end speech processing while maintaining state-of-the-art visual-language performance. The model achieves emotionally rich multimodal dialogue through a semantically-acoustic decoupled speech tokenizer and has reached cutting-edge performance in visual-language and speech benchmarking tests.
Visit

EMOVA Visit Over Time

Monthly Visits

2199

Bounce Rate

0.30%

Page per Visit

2.0

Visit Duration

00:07:14

EMOVA Visit Trend

EMOVA Visit Geography

EMOVA Traffic Sources

EMOVA Alternatives