Recently, Hangzhou-based tech company, Qunhe Technology, has garnered industry attention for its open-source spatial understanding model, SpatialLM, which was specifically acknowledged by Google in a recent research paper. This model's innovation lies in its ability to enable robots to understand the geometric relationships of the physical world through ordinary videos, marking a significant breakthrough in the field of robot training.
SpatialLM's core function is to convert videos captured by a mobile phone into three-dimensional spatial layout information. Users simply need to record a video of their home layout using their phone, and SpatialLM generates a detailed 3D scene, including room structure, furniture placement, and passage widths. This process significantly reduces the cost and improves the efficiency of robot training.
At GTC2025, Qunhe Technology also showcased its virtual training platform, SpatialVerse. This platform, incorporating data generated by SpatialLM, allows robots to train in simulated environments on tasks such as obstacle avoidance and grasping, thus forming a complete closed loop from cognition to action. In short, through this system, robots can not only "see" the spatial layout but also understand how to operate within these environments.
SpatialLM's operating principle is not complex. It utilizes MASt3R-SLAM technology to decompose videos into countless frames, extracting details of objects like sofas and tables and constructing them into point cloud models. The model then converts this data into a structured 3D layout, recording key information about each object, such as its dimensions and position. Compared to traditional training methods, SpatialLM not only saves time and resources but also enhances the robot's spatial cognition abilities.
The uniqueness of this technology lies in its ability to enable robots to understand and process complex environmental changes like humans. Whether it's everyday household items or workplace tools, SpatialLM helps robots quickly adapt and perform tasks. This capability is crucial for improving robot performance in real-world environments, especially in the current embodied intelligence field, where many technologies still face challenges in practical application.
By open-sourcing SpatialLM and SpatialVerse, Qunhe Technology is reshaping the future of robot training, enabling robots to flexibly handle various challenges in the real world.
Project Address: https://github.com/manycore-research/SpatialLM