ByteDance's research team recently launched the second-generation robot large model, GR-2 (Generative Robot 2.0), which is garnering widespread attention in the industry. This intelligent robot not only marks a significant breakthrough in robot large model technology but also heralds a new era for the application of intelligent robots.
The unique aspect of GR-2 lies in its innovative learning method. The development team adopted a training approach that mimics human growth, allowing GR-2 to go through a learning phase akin to a "robot infancy." During the pre-training phase, GR-2 "watched" over 38 million internet videos from various public datasets, covering daily scenarios such as homes, outdoors, and offices. This distinctive "learning by watching videos" method equipped GR-2 with a rich knowledge base and a deep understanding of human daily behaviors and complex environmental conditions.
After extensive pre-training, the development team employed special fine-tuning techniques, significantly enhancing GR-2's ability to predict actions and generate videos. With a simple verbal instruction, such as "pick up the fork from the left side of the white plate," GR-2 can generate accurate action videos to complete tasks effortlessly. This capability opens new possibilities for intelligent decision-making and autonomous operation in robots.
In terms of performance, GR-2 has demonstrated impressive results. With the expansion of the model scale, its ability to handle complex tasks and adapt to new environments has significantly improved. In multi-task learning tests, GR-2 successfully completed 105 desktop tasks with a success rate of 97.7%. Notably, GR-2 not only handles known tasks but also rapidly adapts and finds solutions when faced with new environments, objects, or tasks.
Another highlight of GR-2 is its collaborative capability with large language models. For example, when a user needs a cup of coffee, GR-2 can autonomously complete the entire process from fetching the cup, placing it, brewing coffee, to serving it, showcasing a high level of intelligence and automation.
In terms of environmental adaptability, GR-2 also performs exceptionally well. Whether it's dealing with changes in item positions in fruit and vegetable sorting tasks or performing end-to-end object picking in industrial applications, GR-2 can accurately identify the target and complete the task, demonstrating significant flexibility and adaptability in practical applications.
Although GR-2 has demonstrated outstanding performance in many aspects, the development team acknowledges that there is still room for improvement in the diversity of real-world action data. This indicates that GR-2 is not just a static robot large model but a smart entity capable of continuous learning and adaptation to various tasks, with enormous future potential.
The emergence of GR-2 undoubtedly brings new possibilities to the field of intelligent robots. From home services to industrial automation, the technology showcased by GR-2 has the potential to make a profound impact across multiple domains. As the technology continues to improve and application scenarios expand, we have reason to expect that GR-2 and similar intelligent robot systems will bring revolutionary changes to our ways of living and working in the future.
Project link: https://gr2-manipulation.github.io/