The rapid advancement of deep learning is inseparable from the scaling of datasets, models, and computational power. In the fields of natural language processing and computer vision, researchers have discovered a power-law relationship between model performance and data scale. However, in the field of robotics, particularly in robot manipulation, similar scaling laws have not yet been established.
A research team from Tsinghua University recently published a paper exploring the scaling laws in robot imitation learning and proposed an efficient data collection strategy. They were able to collect sufficient data in just an afternoon, enabling the strategy to achieve approximately 90% success rates in new environments and with new objects.
The researchers divided generalization capabilities into environmental generalization and object generalization, and collected human demonstration data using a handheld gripper in various environments and with different objects. They then modeled this data using a diffusion strategy. The researchers initially focused on two tasks: pouring water and placing a mouse. By analyzing how the performance of the strategy varied with the number of training environments or objects, they summarized the scaling laws of data.
The study results indicate:
The generalization capabilities of the strategy to new objects, new environments, or both are related to the number of training objects, training environments, or training environment-object pairs, respectively, following a power-law relationship.
Increasing the diversity of environments and objects is more effective than increasing the number of demonstrations for each environment or object.
Collecting data in as many environments as possible (e.g., 32 environments), with each environment having one unique object and 50 demonstrations, can train a strategy with strong generalization capabilities (90% success rate), enabling it to operate in new environments and with new objects.
Based on these scaling laws, the researchers proposed an efficient data collection strategy. They suggest collecting data in as many different environments as possible, each with a unique object. When the total number of environment-object pairs reaches 32, it is usually sufficient to train a strategy capable of operating in new environments and interacting with previously unseen objects. For each environment-object pair, it is recommended to collect 50 demonstrations.
To verify the general applicability of the data collection strategy, the researchers applied it to two new tasks: folding towels and unplugging chargers. The results showed that the strategy could also train highly generalized strategies for these new tasks.
The study demonstrates that with relatively modest time and resources, it is possible to learn a single-task strategy that can be deployed zero-shot to any environment and object. To further support research in this area, the Tsinghua team has released their code, data, and models, hoping to inspire further research in the field and ultimately achieve a general-purpose robot capable of solving complex, open-world problems.
Paper link: https://arxiv.org/pdf/2410.18647