Microsoft Teams Launches Multimodal AI Model Magma: Integrating Vision, Language, and Action Decision-Making Skills

AIbase基地

Published inAI News · 4 min read · Feb 20, 2025

271

Recently, a research team from Microsoft, in collaboration with researchers from several universities, released a multimodal AI model named “Magma.” This model is designed to handle and integrate various types of data, including images, text, and videos, to perform complex tasks in both digital and physical environments. As technology continues to advance, multimodal AI agents are being widely applied in fields such as robotics, virtual assistants, and user interface automation.

Previous AI systems typically focused on either visual-language understanding or robotic operations, making it difficult to combine these two capabilities into a unified model. Many existing models perform well in specific areas but have poor generalization capabilities across different application scenarios. For instance, the Pix2Act and WebGUM models excel in UI navigation, while OpenVLA and RT-2 are better suited for robotic manipulation, yet they often require separate training and struggle to bridge the gap between digital and physical environments.

The launch of the “Magma” model aims to overcome these limitations. By introducing a powerful training methodology, it integrates multimodal understanding, action localization, and planning capabilities, enabling AI agents to operate seamlessly in various environments. The training dataset for Magma includes 39 million samples, comprising images, videos, and robotic action trajectories. Additionally, the model employs two innovative techniques: “Set-of-Mark” (SoM) and “Trace-of-Mark” (ToM). The former allows the model to identify operable visual objects in the UI environment, while the latter enables it to track the movement of objects over time, enhancing its future action planning capabilities.

The “Magma” model utilizes advanced deep learning architectures and large-scale pre-training techniques to optimize its performance across multiple domains. The model uses the ConvNeXt-XXL visual backbone for processing images and videos, while the LLaMA-3-8B language model handles text input. This architecture allows “Magma” to efficiently integrate visual, language, and action execution. After comprehensive training, the model has achieved outstanding results on multiple tasks, demonstrating strong multimodal understanding and spatial reasoning abilities.

Project link: https://microsoft.github.io/Magma/

Key Highlights:
🌟 The Magma model has been trained on over 39 million samples, showcasing its powerful multimodal learning capabilities.
🤖 This model successfully integrates vision, language, and action, overcoming the limitations of existing AI models.
📈 Magma has excelled in multiple benchmark tests, demonstrating strong generalization capabilities and excellent decision-making execution.

RobotLAB Launches its First Humanoid Robot, BroBot™

RobotLAB, a leading global robotics integration and AI automation company, officially launches its first humanoid robot, BroBot™. This first-generation robot is designed for scalable real-world deployments across multiple industries including education, logistics, and hospitality, marking a significant advancement in humanoid robot capabilities. BroBot™ is engineered to operate in dynamic and unpredictable environments, employing a hybrid analog-digital interface, automated task protocols, and a context-aware system, enabling it to perform crucial tasks with minimal supervision.

Guangdong Unveils New Policies to Boost AI and Robotics Industries: Driving AI+ and Robotics+ Innovation

On April 1st, the Guangdong Provincial Government held a press conference in Guangzhou to announce the "Several Policy Measures to Promote the Innovative Development of the Artificial Intelligence and Robotics Industry in Guangdong Province" (hereinafter referred to as the "Policy Measures"), officially unveiling a series of support policies aimed at accelerating the development of the artificial intelligence (AI) and robotics industries. The conference revealed that Guangdong will focus on creating typical application scenarios, concentrating on the deep integration of AI and robotics in key areas, and fully launching the "AI+" and "Robotics+" action plans. This policy not only highlights Guangdong's commitment to emerging technologies...

Qianxun Intelligence, a Leading Embodied AI Company, Completes 528 Million RMB Pre-A Round Funding

On March 31, Qianxun Intelligence (Spirit AI), a leading company in the embodied AI field, announced the completion of a Pre-A round of funding totaling 528 million RMB. This round was led by Prosperity7 Ventures (P7), an affiliate of Aramco Ventures, with participation from several renowned investors including China Merchants Venture Capital, GF Securities, Jingya Capital, Orient Capital, and Huacontrol Investment.

Dyna Robotics Secures $23.5 Million Seed Funding for Affordable AI Robots

Dyna Robotics, a startup emerging from stealth mode, announced a $23.5 million seed funding round led by CRV and First Round Capital. The company aims to provide affordable embodied AI robots for businesses of all sizes. Dyna Robotics co-founders Lindon Gao and York Yang previously founded the hardware AI company Caper.AI.

AI Daily: Meituan Develops Internal Large Model LongCat; vivo Establishes Independent Robotics LAB Center; Tencent Releases Official Version of HunYuan T1

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day we present you with the hottest content in the AI field, focusing on developers and helping you understand technology trends and learn about innovative AI product applications. Discover fresh AI products here: https://top.aibase.com/ 1、Alibaba Tongyi Lab's LHM technology achieves rapid 3D human body reconstruction and animation generation from a single image. Alibaba Tongyi Lab's LHM technology has made a significant breakthrough in the field of 3D human body reconstruction, using multi-modal transformation...

Vivo Establishes Robotics LAB, Entering the Robotics Race

Recent reports indicate that vivo has established a Robotics LAB, focusing on the incubation and R&D of robotic products, particularly in the home robotics sector. This has drawn significant industry attention. While responding to inquiries from the Daily Economic News, vivo confirmed the news as a normal business adjustment, hinting that more details on their robotics business will be unveiled at the upcoming Boao Forum for Asia. In fact, vivo has shown a strong interest in the robotics industry for some time. In December 2023, vivo...