MEMO
An audio-driven model for generating expressive talking videos.
CommonProductVideoVideo GenerationAudio-Driven
MEMO is an advanced open-weight model designed for audio-driven talking video generation. By utilizing a memory-guided temporal module and emotion-aware audio module, it enhances long-term identity consistency and motion smoothness, while refining facial expressions based on the emotions detected in the audio. The primary advantages of MEMO include more realistic video generation, improved audio-lip sync, identity consistency, and emotional expression alignment. Technical background information shows that MEMO generates more authentic talking videos across various image and audio types, surpassing existing state-of-the-art methods.