GenXD
An advanced framework for generating any 3D and 4D scenes.
CommonProductImage3D Generation4D Generation
GenXD is a framework focused on 3D and 4D scene generation, utilizing common camera and object motion found in everyday life to jointly study general 3D and 4D generation. Due to a lack of large-scale 4D data in the community, GenXD initially proposes a data planning process to extract camera poses and object motion intensity from videos. Based on this process, GenXD introduces a large-scale real-world 4D scene dataset: CamVid-30K. By leveraging all 3D and 4D data, the GenXD framework can generate any 3D or 4D scene. It offers a multi-view-time module that separates camera and object motion, learning seamlessly from 3D and 4D data. Furthermore, GenXD employs masked latent conditions to support various conditional views. GenXD can generate videos that follow camera trajectories and consistent 3D views that can be enhanced to 3D representations. It has undergone extensive evaluation across various real-world and synthetic datasets, demonstrating its effectiveness and versatility in 3D and 4D generation compared to previous methods.