NVIDIA, in collaboration with the University of Maryland, has developed an artificial intelligence model called QUEEN, which has revolutionized the field of free-viewpoint video streaming with its outstanding performance. This technology allows users to experience 3D scenes from any angle, bringing unprecedented efficiency and visual quality to real-time reconstruction and streaming of dynamic scenes.
The QUEEN model stands out with its impressive data processing capabilities, achieving a training time of under 5 seconds and a rendering speed of about 350 FPS with a model size of only 0.7MB per frame. This combination of speed and visual quality opens up possibilities for immersive virtual reality experiences in media broadcasting of concerts and sports events, as well as instant replays of key moments.
Shalini De Mello, head of research at NVIDIA, emphasized that QUEEN balances compression rate, visual quality, encoding time, and rendering time by simultaneously reconstructing and compressing 3D scenes, creating an optimized workflow that sets new standards for visual quality and streamability.
The innovation of the QUEEN model lies in its handling of both dynamic and static content. It focuses on reconstructing content that changes over time while reusing rendered static areas, thus saving computational time. This approach not only enhances efficiency but also supports streaming applications, making QUEEN faster in rendering visual effects compared to previous methods.
QUEEN has broad application prospects, ranging from immersive streaming applications and skill teaching to multi-angle viewing of sports events and remote operations in industrial environments, showcasing tremendous potential. The research team has also proposed a new framework for 3D Gaussian splatting, where QUEEN directly learns the Gaussian attribute residuals between consecutive frames at each time step without imposing any structural constraints, demonstrating high-quality reconstruction and generalizability.
To effectively store residuals, the team further proposed a quantization sparse framework, which includes a learned latent decoder for efficiently quantizing residuals of attributes outside Gaussian positions, as well as a learned gating module for sparsifying position residuals. This innovation allows QUEEN to outperform state-of-the-art online FVV methods across all metrics in various FVV benchmarks.
NVIDIA plans to open-source the QUEEN code to further advance the development of AI applications, marking an important progress in the field of video streaming. The launch of QUEEN provides new possibilities for content delivery and user engagement, heralding the future of AI-driven video streaming.