JoyVASA is an audio-driven character animation technique based on diffusion models that generates facial dynamics and head movements by separating dynamic facial expressions from static 3D facial representations. This technology enhances video quality and lip-sync accuracy, expands into animal facial animation, supports multiple languages, and improves training and inference efficiency. Key advantages of JoyVASA include the ability to generate longer videos, motion sequence generation independent of character identity, and high-quality animation rendering.