Want to Make Robots Smarter? Tsinghua Team Discovers the Secret to Accelerated Robot Learning

AIbase基地

Published inAI News · 5 min read · Nov 12, 2024

167

The rapid advancement of deep learning is inseparable from the scaling of datasets, models, and computational power. In the fields of natural language processing and computer vision, researchers have discovered a power-law relationship between model performance and data scale. However, in the field of robotics, particularly in robot manipulation, similar scaling laws have not yet been established.

A research team from Tsinghua University recently published a paper exploring the scaling laws in robot imitation learning and proposed an efficient data collection strategy. They were able to collect sufficient data in just an afternoon, enabling the strategy to achieve approximately 90% success rates in new environments and with new objects.

The researchers divided generalization capabilities into environmental generalization and object generalization, and collected human demonstration data using a handheld gripper in various environments and with different objects. They then modeled this data using a diffusion strategy. The researchers initially focused on two tasks: pouring water and placing a mouse. By analyzing how the performance of the strategy varied with the number of training environments or objects, they summarized the scaling laws of data.

The study results indicate:

The generalization capabilities of the strategy to new objects, new environments, or both are related to the number of training objects, training environments, or training environment-object pairs, respectively, following a power-law relationship.

Increasing the diversity of environments and objects is more effective than increasing the number of demonstrations for each environment or object.

Collecting data in as many environments as possible (e.g., 32 environments), with each environment having one unique object and 50 demonstrations, can train a strategy with strong generalization capabilities (90% success rate), enabling it to operate in new environments and with new objects.

Based on these scaling laws, the researchers proposed an efficient data collection strategy. They suggest collecting data in as many different environments as possible, each with a unique object. When the total number of environment-object pairs reaches 32, it is usually sufficient to train a strategy capable of operating in new environments and interacting with previously unseen objects. For each environment-object pair, it is recommended to collect 50 demonstrations.

To verify the general applicability of the data collection strategy, the researchers applied it to two new tasks: folding towels and unplugging chargers. The results showed that the strategy could also train highly generalized strategies for these new tasks.

The study demonstrates that with relatively modest time and resources, it is possible to learn a single-task strategy that can be deployed zero-shot to any environment and object. To further support research in this area, the Tsinghua team has released their code, data, and models, hoping to inspire further research in the field and ultimately achieve a general-purpose robot capable of solving complex, open-world problems.

Paper link: https://arxiv.org/pdf/2410.18647

Amazon Plans to Increase Investment in Anthropic and Build the World's Largest Data Center Together!

Amazon plans additional investment in AI startup Anthropic to strengthen their partnership. After investing $8B, the new round could make Amazon a major shareholder. They will collaborate on the world's largest data center project to provide computing power for Anthropic and sell its tech to AWS customers. Anthropic, founded by ex-OpenAI employees, competes with ChatGPT via its Claude model. Amazon also aims to invest $11B in Indiana data centers....

Microsoft Launches Deep Research: Automated Research Aids Scientific and Business Analysis

Microsoft has released the public preview of Deep Research, a new service called Azure AI Foundry. This service functions as a research assistant similar to OpenAI agents. It can automatically break down complex tasks, perform multi-turn information retrieval and verification using Bing search and GPT models, and generate audit-able research reports. The service is applicable to fields such as academia, finance, and healthcare, and supports API integration, significantly improving research efficiency. Applications are now open, and developers can integrate its automation capabilities into their own applications.

Microsoft Launches Deep Research: Integration of Bing and OpenAI to Revolutionize Automated Research

Microsoft launches the Deep Research research tool, which integrates Bing search and OpenAI technology to automate research. The tool uses the core technology o3-deep-research, with a workflow that includes four key steps: first, interacting with GPT-4o/4.1 to clarify user requirements; second, calling Bing to retrieve the latest data; third, performing intelligent analysis and reasoning; finally, generating a structured report containing answers, reasoning process, cited sources, and clarification records. The tool supports integration with Azure AI

CoreWeave Plans to Acquire Core Scientific to Strengthen AI Infrastructure Strategy

AI infrastructure company CoreWeave plans to acquire data center operator Core Scientific for $9 billion in stock, with the transaction expected to be completed by Q4 2025. The acquisition will help CoreWeave reduce operational costs, decrease external dependencies, and enhance its AI computing capabilities. Core Scientific currently has 1.3 gigawatts of power capacity and 1 gigawatt of expansion potential, which will support CoreWeave's growing AI computing demand. Previously, OpenAI had invested in Core

Google Gemini AI Raises Privacy Concerns! User Settings Ineffective, Deep Integration with Third-Party Applications

Google recently introduced a new strategy that allows the Gemini AI assistant to access third-party applications such as WhatsApp, raising privacy concerns. Even if users disable interaction permissions, Gemini can still access some application data, which will be reviewed by humans and retained for 72 hours. Although an option to disable it is provided, the instructions are unclear, making it difficult for users to confirm whether the permission has been successfully disabled. Experts have criticized this strategy as similar to Microsoft's forced bundling of IE, and suggest uninstalling Gemini via the command line, although the process is complex. This incident highlights the challenges faced by tech companies.

AI Daily: Bilibili May Launch an AI Creation Tool Named H; Zhiyuan Unveils Naoche Robot Lingxi X2-N; Yushu Technology Pursues IPO on Sci-Tech Innovation Board

AI Daily: B站 launches 'H' tool for video creation; Zhiyuan unveils dual-mode robot X2-N; Yushu Tech aims for IPO at $12B valuation; EarthMind innovates earth data analysis; Gemini CLI updates AV features; macOS assistant Glass goes open-source; Claude to release math-focused Neptune v3; OpenAI's GPT-5 to integrate multi-models.....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Want to Make Robots Smarter? Tsinghua Team Discovers the Secret to Accelerated Robot Learning

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Amazon Plans to Increase Investment in Anthropic and Build the World's Largest Data Center Together!

Hugging Face Launches Reachy Mini Robot to Empower Open Source AI Developers

Zhiyuan Robot Announces Patent Related to Robot Motion Control Model

DingTalk AI Table Launches: Process 1,000 Tasks in 1 Hour, Easy Data Analysis for Everyone

Massive Transaction! CoreWeave Acquires Data Center Giant Core Scientific for $9 Billion

Microsoft Launches Deep Research: Automated Research Aids Scientific and Business Analysis

Microsoft Launches Deep Research: Integration of Bing and OpenAI to Revolutionize Automated Research

CoreWeave Plans to Acquire Core Scientific to Strengthen AI Infrastructure Strategy

Google Gemini AI Raises Privacy Concerns! User Settings Ineffective, Deep Integration with Third-Party Applications

AI Daily: Bilibili May Launch an AI Creation Tool Named H; Zhiyuan Unveils Naoche Robot Lingxi X2-N; Yushu Technology Pursues IPO on Sci-Tech Innovation Board