EMOVA

Emotionally Rich Multimodal Language Model

CommonProductOthersMultimodalSpeech Recognition
EMOVA (Emotionally Omni-present Voice Assistant) is a multimodal language model capable of end-to-end speech processing while maintaining state-of-the-art visual-language performance. The model achieves emotionally rich multimodal dialogue through a semantically-acoustic decoupled speech tokenizer and has reached cutting-edge performance in visual-language and speech benchmarking tests.
Visit

EMOVA Visit Over Time

Monthly Visits

1153

Bounce Rate

59.77%

Page per Visit

1.0

Visit Duration

00:00:00

EMOVA Visit Trend

EMOVA Visit Geography

EMOVA Traffic Sources

EMOVA Alternatives