Norwegian startup 1X Technologies recently claimed significant progress in developing AI-based world models for robots. Simply put, these models serve as virtual testing grounds for robots, allowing them to be tested and improved in various scenarios without the need for real-world testing.
1X believes this is the key to solving the "robot dilemma" — how to reliably evaluate robots trained for multiple tasks in constantly changing environments. Take a robot capable of folding T-shirts, for example; its performance varied over 50 days, with a sense of achievement often fleeting.
1X notes that even the same robot model can exhibit significant performance fluctuations with environmental changes, making rigorous real-world assessments extremely difficult.
To train their world models, 1X collected thousands of hours of video footage documenting their humanoid robot EVE performing various tasks in homes and offices. Through machine learning, the models can now reasonably predict reactions of objects and environments to robot actions. Even unprogrammed behaviors can generate credible visual outputs, such as learning to avoid contact with humans and objects.
Currently, 1X's models can handle some complex physical interactions, such as grabbing and lifting objects, opening doors and drawers, and dealing with deformable materials like clothing, even folding T-shirts.
The core value of their world models lies in simulating object interactions. For example, in the next few generations, the model will receive the same initial scene and three sets of different actions to grab boxes. In each case, the grabbed box will be lifted and moved with the mechanical hand's action, while other boxes remain stationary.
Despite this, 1X acknowledges some limitations. For instance, the model sometimes struggles to maintain consistent object colors and shapes, or accurately simulate physical phenomena. Self-recognition in mirrors also remains unreliable.
Despite the challenges, 1X still views these world models as a milestone in developing and training general-purpose robots. To accelerate progress, the company also offers datasets, pre-trained models, and prizes through the "1X World Model Challenge."
1X's long-term goal is to directly use world models for robot training, which would bring significant efficiency improvements over real testing. To achieve this goal, they are actively recruiting experts in the field of AI. Earlier this year, 1X successfully raised $100 million to push forward the market launch of their household humanoid robot Neo, with support from industry leaders like OpenAI, fully demonstrating high expectations for 1X's technology.
In addition to 1X, NVIDIA is also making significant investments in humanoid robots. The company recently introduced a training method using Apple's Vision Pro, and NVIDIA researcher Jim Fan believes that within the next few years, robotics will experience a "GPT-3 moment."