A research team from the Czech Technical University in Prague and the Swiss Federal Institute of Technology in Zurich has recently introduced an innovative method called "WildGaussians," which significantly enhances the capabilities of 3D Gaussian Splatting (3DGS) technology in handling unstructured image sets. This groundbreaking advancement makes it possible to achieve high-quality 3D reconstruction from unstructured image sets, such as landmark photos collected from the internet.
WildGaussians primarily addresses two key challenges: changes in appearance and lighting, as well as the occlusion of moving objects. The research team tackled these challenges by developing two core components: appearance modeling and uncertainty modeling.
Appearance modeling allows the system to handle images taken under different conditions, such as different times or weather. This method uses trainable embeddings for each training image and Gaussian distribution, and adjusts the color of the Gaussian distribution through a neural network (MLP) to suit the corresponding shooting conditions.
Uncertainty modeling helps identify and ignore occluding objects like pedestrians or cars during training. Researchers utilized pre-trained DINOv2 features to enhance the system's adaptability to landscape changes.
In terms of performance, WildGaussians excels on challenging datasets such as NeRF On-the-go and Photo Tourism, surpassing the current state-of-the-art methods. Additionally, the method achieves a real-time rendering speed of 117 images per second on an Nvidia RTX4090 GPU.
Although WildGaussians has made significant progress in the field of 3D reconstruction, researchers acknowledge that the method still has some limitations, such as the representation of specular highlights on objects. They plan to further improve this method in the future by integrating technologies such as diffusion models.
This research opens up new possibilities for robust, versatile, and photorealistic 3D reconstruction from noisy user-generated data, with the potential to have a profound impact in multiple fields including virtual reality, augmented reality, and computer vision.