1X Technologies, a robotics startup, has developed a new generative model that enhances the efficiency of training robotic systems in simulations. The company announced this model in a recent blog post, addressing one of the significant challenges in robotics: learning the "1X World Model," which can predict how the world responds to the robot's actions. This model allows robots to envision multiple future scenarios based on different action proposals when starting from the same initial image sequence.

image.png

This capability enables the model to predict complex object interactions, such as the movement of rigid bodies, the effects of objects falling, and interactions with deformable objects (like curtains and clothes) and articulated objects (like doors and drawers).

Evaluation is a practical yet often overlooked challenge in building general-purpose robots. If a robot is trained to complete 1000 unique tasks, it's difficult to determine if a new model has improved across all 1000 tasks. Small environmental changes, such as background and lighting variations, can render old experimental results obsolete, especially in dynamic home or office environments.

image.png

To overcome this issue, 1X has adopted a novel approach, directly building simulators from real sensor data to evaluate 1X's robot policies across millions of scenarios. This simulator not only allows for repeatable testing but also fully absorbs the complexities of the real world.

During 1X's training process, thousands of hours of data on humanoid robots performing various mobile manipulation tasks in homes and offices have been collected. Using this data, 1X's world model can predict future videos based on observations and actions.

Under different action commands, the model can generate diverse results, showcasing its robust simulation capabilities for object interactions. Even without specific actions, the model can generate logical videos, such as identifying and avoiding people and obstacles while driving.

Additionally, the model can generate longer task videos, such as folding a T-shirt.

Of course, 1X's model faces some challenges, such as not maintaining the shape and color of objects in interactions, or objects sometimes disappearing. Additionally, there are limitations in understanding physical laws, such as objects sometimes appearing to float in the air.

To advance research in this field, 1X has released over 100 hours of vector-quantized videos and pre-trained baseline models, and launched the 1X World Model Challenge, which includes multiple stages and cash prizes to encourage further research.

Key Points:

🌟 The world model is a virtual simulator capable of predicting how the environment interacts with the robot's actions.

🤖 Through learning from real data, the model can be evaluated across millions of scenarios, enhancing robot intelligence.

💰 To promote research, the 1X World Model Challenge has been launched, offering cash incentives.