Scientists from ShanghaiTech University have recently developed an artificial intelligence model named CLAY, which can generate detailed 3D objects from textual descriptions or two-dimensional images. Compared to previous technologies, CLAY has achieved significant breakthroughs in both the quality and diversity of 3D object generation.
The core of the CLAY model includes a multi-resolution variational autoencoder (VAE) and a diffusion transformer (DiT). The VAE is responsible for encoding 3D geometric shapes of different levels of detail into a latent space, while the DiT generates these geometric shapes. Unlike many other systems, CLAY can directly process 3D content without first converting it to 2D images.
CLAY's training data comprises over 500,000 3D models, ranging from simple everyday objects to complex fantasy creatures. Additionally, CLAY has the capability to be controlled through additional inputs, allowing users to precisely control the generation results by specifying rough shapes (such as voxel structures, point clouds), or bounding boxes. This flexibility enables CLAY to generate entire city scenes or even reconstruct detailed 3D models from hand-drawn sketches.
When compared to other systems such as Shap-E, DreamFusion, and Wonder3D, CLAY demonstrates clear advantages. Whether converting text to 3D or images to 3D, CLAY can generate more consistent geometric shapes with smoother surfaces and finer details. The speed at which CLAY generates high-quality 3D assets is also remarkable, taking only about 45 seconds, while some competing systems may require several hours for optimization.
The potential applications of CLAY are very broad, including game development, film production, and 3D printing, among others. Nevertheless, researchers are also aware of the potential risks of AI-generated virtual content, and therefore they plan to add more safety measures to ensure responsible use.
In the future, researchers also plan to further expand the training data, improve model quality, and integrate geometric generation and material synthesis into a single model to achieve more comprehensive functionality. A version of CLAY can be accessed through the 3D-Gen service Rodin.
Product Access: https://hyperhuman.deemos.com/rodin
### Key Takeaways:
- 🏆 **CLAY's Breakthrough in 3D Generation Technology**: CLAY can generate detailed 3D objects from text and images, with superior quality and speed compared to previous technologies.
- ⚡ **Incredible Generation Speed**: CLAY can generate high-quality 3D assets in just about 45 seconds, much faster than other systems.
- 🎮 **Wide-ranging Applications**: CLAY has the potential to play a significant role in various fields such as game development, film production, and 3D printing.