Recently, a research team from the University of Toronto and Vector Institute released the CAP4D model, a new technology based on the Multi-View Deformation Model (MMDM) that can generate realistic 4D avatars using any number of reference images.

The model employs a two-stage approach, first using MMDM to generate images from different angles and expressions, and then combining these generated images with reference images to reconstruct a 4D avatar that can be controlled in real time.

In the workflow of CAP4D, users can input any number of reference images, which are encoded into the latent space of a variational autoencoder. Next, existing facial tracking technology, FlowFace, is used to estimate the 3D deformation model (FLAME) for each reference image, extracting information such as head pose, expression, and camera angle. MMDM then generates multiple different images through random sampling at each iteration of the generation process, combining with the input reference images.

image.png

The research team demonstrated various avatars generated by CAP4D, covering scenarios with a single reference image, a few reference images, and more challenging cases of generating avatars from text prompts or artwork. By using multiple reference images, the model can recover details and geometries that cannot be seen in a single image, enhancing the reconstruction effect. Additionally, CAP4D can integrate with existing image editing models, allowing users to edit the appearance and lighting of the generated avatars.

image.png

To further enhance the expressiveness of avatars, CAP4D can combine the generated 4D avatars with voice-driven animation models to achieve audio-driven animation effects. This allows the avatars not only to display static visual effects but also to interact dynamically with users through sound, opening up new avenues for virtual avatar applications.

Key Highlights:

🌟 The CAP4D model can generate high-quality 4D avatars using any number of reference images, employing a two-stage workflow.  

🖼️ This technology can create avatars from multiple different perspectives, significantly improving image reconstruction effects and detail presentation.  

🎤 CAP4D integrates with voice-driven animation models to achieve audio-driven dynamic avatars, expanding the application scenarios for virtual avatars.