EMAGE is a unified model for generating overall conversational gestures. It generates natural hand gestures by modeling expressive masked audio gestures. It can capture speech and rhythm information from audio input and generate corresponding body postures and hand gesture sequences. EMAGE can generate highly dynamic and expressive gestures, thereby enhancing the interactive experience of virtual characters.