Google is Enhancing Robot Navigation and Task Completion Capabilities with Gemini AI Training.

AIbase

Published inAI News · 3 min read · Jul 12, 2024

132

Google is training its robots through Gemini AI to enhance their navigation and task completion abilities.

In a new research paper, the DeepMind Robotics Team explains in detail how to utilize the long context window of Gemini1.5Pro to make it easier for users to interact with the RT-2 robot using natural language commands. By filming video tours of specified areas, researchers used Gemini1.5Pro to have the robot "watch" the videos to understand the environment, allowing the robot to execute commands based on what it observes, such as guiding users to the power outlet for charging.

DeepMind states that robots equipped with Gemini successfully executed over 50 user commands within a 9000 square feet operational area, with a success rate of 90%.

Additionally, researchers found that Gemini1.5Pro enables robots to plan how to complete commands, not limited to navigation. For example, when a user asks the robot if they have their favorite drink on a table full of cola cans, Gemini informs the robot to check the fridge and then report the result to the user. DeepMind says they will further investigate these findings.

According to the research paper, although the video demonstration provided by Google is impressive, as shown in the paper, it takes the robot 10-30 seconds to process these commands. Although we may need some time before we can share our homes with more advanced environment mapping robots, at least these robots might be able to help us find lost keys or wallets.

Highlight:

🤖 Gemini AI trains robots to improve navigation and task completion capabilities

🧠 Gemini1.5Pro enables robots to execute natural language commands

🔍 Research shows Gemini allows robots to plan commands beyond navigation

AI Headlines

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Ant Group's BaiLing Large Model Team Open Sources Ring-flash-linear-2.0-128K, Combining Hybrid Attention and MoE Architecture to Reshape Long-Text Programming Efficiency

Ant Group open-sources the BaiLing Large Model Ring-flash-linear-2.0-128K, specifically targeting long-text programming. It employs a hybrid linear attention mechanism with a sparse MoE architecture, achieving performance comparable to a 40B dense model by activating only 6.1B parameters. It achieves optimal results in code generation and intelligent agent applications, efficiently addressing the challenges of long context processing.

Oct 28, 2025

Anthropic Launches New Features for Claude to Provide Powerful Tools for Financial Analysts

Anthropic launches new features for the Claude AI assistant for financial analysis, including an Excel add-in, data connectivity, and AI skills, helping analysts with cash flow modeling and valuation comparisons. The Excel add-in is in the testing phase, driven by Sonnet4.5, and can analyze and edit data directly within spreadsheets, improving efficiency.

Oct 28, 2025

AI Daily: Hailuo 2.3 Released; Dou Bao's AI Programming Gets a Major Upgrade; Musk Launches AI Encyclopedia Grokipedia

[AI Daily] Focusing on AI Trends, Hailuo 2.3 Released: Achieves cinematic-level video generation, making breakthroughs in actions, expressions, and physical interactions, marking the entry of AI video into the professional film industry. A dual-mode strategy adapts to various scenarios and supports free trial.

Oct 28, 2025

Ant AI AQ Ranks 7th in China's AI Application List, Leading Industry Growth Rate, Exceeding Wen Xiaoyan and Other General AI Products

According to QuestMobile data, in the Top 10 list of China's AI-native applications for the third quarter of 2025, Ant Group's AI health application AQ performed outstandingly, rising to the 7th position, becoming the only health-related application on the list. Its user base has already exceeded general AI products like Tongyi and Wen Xiaoyan. Within just over three months of its release, it achieved rapid growth, with a compound growth rate of 83.4% in the third quarter.

Oct 28, 2025

Free Trial! Hailuo 2.3 Released! Text-to-Movie-Grade Video Generation with Accurate Movements and Expressions

MiniMax released the Hailuo 2.3 model, achieving a major breakthrough in text-to-video technology. The model delivers movie-level realism, reaching new heights in three key dimensions: motion fluidity, facial micro-expressions, and physical interactions. Motion trajectories follow real physics, facial expressions are delicate and realistic, and physical interactions are natural and authentic. This marks the official entry of AI video generation into the professional film and television era.

Oct 28, 2025

Sequoia Capital Invests in Rogo Technologies AI Tool May Disrupt the Role of Junior Bankers

Sequoia Capital invests in NYC AI startup Rogo Technologies, valued at $750M, developing tools to enhance investment bankers' efficiency with analyst-level AI systems for complex financial tasks.....

Oct 28, 2025

Zero-Code Programming in Seconds! Douyin's Duanbao AI Coding Makes a Historic Upgrade: PPT-Style Drag-and-Drop + Multi-Agent Automatic Collaboration - Product Managers Can Also Become Full-Stack Developers!

ByteDance's Duanbao AI coding achieves a paradigm shift, upgrading from code completion to full automatic product delivery. Through a PPT-style visual interface and multi-agent collaboration, beginners can generate an online H5, data dashboard, or activity page with just one sentence or a sketch in 8 minutes, achieving the breakthrough of 'saying a sentence and going live'.

Oct 28, 2025

Hailuo2.3 AI Video Generation Model Launches on Replicate Platform, Bringing Realistic Physics and Cinematic Effects

MiniMax's video generation model Hailuo2.3 is launched on the Replicate platform, supporting text and image input to generate high-quality videos. The model improves training efficiency through the NCR architecture, with realistic physics simulation and smooth action capture capabilities, driving innovation in dynamic visual effects in fields such as movies and advertising.

Oct 28, 2025

120

New Executives of 01.AI Make Their Debut! How Is Kai Fu Lee Driving AI Application in ToB 2.0?

01.AI appoints three executives to strengthen To B business. Kai-Fu Lee emphasizes CEO-led AI transformation for value creation. New team to drive enterprise AI strategy.....

Oct 28, 2025

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

NVIDIA released the multimodal understanding model OmniVinci, which outperformed top models by 19.05 points in benchmark tests. The model achieves excellent performance with only 1/6 of the training data. It aims to enable AI systems to simultaneously understand vision, audio, and text, simulating human multisensory perception of the world.

Oct 28, 2025

100

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Google is Enhancing Robot Navigation and Task Completion Capabilities with Gemini AI Training.

AIbase

This article is from AIbase Daily

AI News Recommendations

Ant Group's BaiLing Large Model Team Open Sources Ring-flash-linear-2.0-128K, Combining Hybrid Attention and MoE Architecture to Reshape Long-Text Programming Efficiency

Anthropic Launches New Features for Claude to Provide Powerful Tools for Financial Analysts

AI Daily: Hailuo 2.3 Released; Dou Bao's AI Programming Gets a Major Upgrade; Musk Launches AI Encyclopedia Grokipedia

Ant AI AQ Ranks 7th in China's AI Application List, Leading Industry Growth Rate, Exceeding Wen Xiaoyan and Other General AI Products

Free Trial! Hailuo 2.3 Released! Text-to-Movie-Grade Video Generation with Accurate Movements and Expressions

Sequoia Capital Invests in Rogo Technologies AI Tool May Disrupt the Role of Junior Bankers

Zero-Code Programming in Seconds! Douyin's Duanbao AI Coding Makes a Historic Upgrade: PPT-Style Drag-and-Drop + Multi-Agent Automatic Collaboration - Product Managers Can Also Become Full-Stack Developers!

Hailuo2.3 AI Video Generation Model Launches on Replicate Platform, Bringing Realistic Physics and Cinematic Effects

New Executives of 01.AI Make Their Debut! How Is Kai Fu Lee Driving AI Application in ToB 2.0?

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Google is Enhancing Robot Navigation and Task Completion Capabilities with Gemini AI Training.

AIbase

This article is from AIbase Daily

AI News Recommendations

Ant Group's BaiLing Large Model Team Open Sources Ring-flash-linear-2.0-128K, Combining Hybrid Attention and MoE Architecture to Reshape Long-Text Programming Efficiency

Anthropic Launches New Features for Claude to Provide Powerful Tools for Financial Analysts

AI Daily: Hailuo 2.3 Released; Dou Bao's AI Programming Gets a Major Upgrade; Musk Launches AI Encyclopedia Grokipedia

Ant AI AQ Ranks 7th in China's AI Application List, Leading Industry Growth Rate, Exceeding Wen Xiaoyan and Other General AI Products

Free Trial! Hailuo 2.3 Released! Text-to-Movie-Grade Video Generation with Accurate Movements and Expressions

Sequoia Capital Invests in Rogo Technologies AI Tool May Disrupt the Role of Junior Bankers

Zero-Code Programming in Seconds! Douyin's Duanbao AI Coding Makes a Historic Upgrade: PPT-Style Drag-and-Drop + Multi-Agent Automatic Collaboration - Product Managers Can Also Become Full-Stack Developers!

Hailuo2.3 AI Video Generation Model Launches on Replicate Platform, Bringing Realistic Physics and Cinematic Effects

New Executives of 01.AI Make Their Debut! How Is Kai Fu Lee Driving AI Application in ToB 2.0?

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

GEO Services