Google is training its robots through Gemini AI to enhance their navigation and task completion abilities.
In a new research paper, the DeepMind Robotics Team explains in detail how to utilize the long context window of Gemini1.5Pro to make it easier for users to interact with the RT-2 robot using natural language commands. By filming video tours of specified areas, researchers used Gemini1.5Pro to have the robot "watch" the videos to understand the environment, allowing the robot to execute commands based on what it observes, such as guiding users to the power outlet for charging.
DeepMind states that robots equipped with Gemini successfully executed over 50 user commands within a 9000 square feet operational area, with a success rate of 90%.
Additionally, researchers found that Gemini1.5Pro enables robots to plan how to complete commands, not limited to navigation. For example, when a user asks the robot if they have their favorite drink on a table full of cola cans, Gemini informs the robot to check the fridge and then report the result to the user. DeepMind says they will further investigate these findings.
According to the research paper, although the video demonstration provided by Google is impressive, as shown in the paper, it takes the robot 10-30 seconds to process these commands. Although we may need some time before we can share our homes with more advanced environment mapping robots, at least these robots might be able to help us find lost keys or wallets.
Highlight:
🤖 Gemini AI trains robots to improve navigation and task completion capabilities
🧠 Gemini1.5Pro enables robots to execute natural language commands
🔍 Research shows Gemini allows robots to plan commands beyond navigation