Microsoft Open-Sources Multimodal AI Agent "Magma": Revolutionizing Shopping and Robotics

AIbase基地

Published inAI News · 3 min read · Feb 26, 2025

Microsoft has officially released and open-sourced its multi-modal AI Agent foundation model, "Magma," on its website. This emerging technology showcases significantly enhanced multi-modal capabilities compared to traditional smart assistants, handling images, videos, and text, bridging the gap between the digital and physical worlds.

Magma not only assists users with everyday tasks like automated online shopping and weather checks but also collaborates with physical robots to execute more complex operations. For instance, during a real-life chess game, Magma provides real-time strategic advice, enhancing the gaming experience. It also features psychological prediction capabilities, anticipating the future actions of people or objects in videos, enabling virtual assistants or robots to better understand their dynamic environment and react accordingly.

According to the official introduction, Magma's applications are extensive. It can help home robots learn to organize unfamiliar items and generate step-by-step user interface navigation instructions for unfamiliar tasks for virtual assistants. These features provide users with more precise assistance and guidance when encountering new environments or tasks.

Magma is part of the Vision-Language-Action (VLA) foundation model, learning from massive amounts of publicly available visual and language data. This capability allows Magma to effectively integrate language, spatial, and temporal intelligence, providing solutions for complex tasks in both the digital and physical worlds.

The open-sourcing of Magma provides developers and researchers with a powerful tool, fostering advancements in smart assistants and home robotics. In the future, as this technology matures, we can expect to see more innovative applications based on Magma in our daily lives.

Project address: https://microsoft.github.io/Magma/

Microsoft CTO: Product Managers Play a Crucial Role in AI Training

Microsoft's Chief Technology Officer, Kevin Scott, highlighted the importance of product managers in training AI agents. According to him, product managers are not only central to product design and development, but also play a crucial role in creating 'feedback loops'. These feedback loops help AI agents continuously learn and improve their ability to perform tasks, better meeting user needs. Kevin Scott points out that the effectiveness of AI systems is heavily reliant on human feedback. Product managers are integral to this process.

Amazon Launches Nova Act: A New Push into the AI Agent Market

Amazon on Monday unveiled Nova Act, a general-purpose AI agent capable of controlling web browsers and independently performing simple operations. Also launched was the Nova Act SDK, allowing developers to build agent prototypes using Nova Act. Developed by Amazon's newly established AGI lab in San Francisco, Nova Act will also provide key functionality for the company's upcoming Alexa+ (the generative AI-enhanced version of Amazon's voice assistant). However, the currently released version of Nova Act is...

Anthropic and Databricks Partner in $100 Million Deal to Develop AI Agent Tools

Anthropic and Databricks announced a five-year, $100 million partnership to develop AI agent tools for enterprise task automation. Databricks CEO Ali Ghodsi stated that Anthropic's Claude model will be directly available on the Databricks platform, enabling clients to leverage their own data to develop custom AI solutions.

Arcade.dev Secures $12M Seed Funding to Secure AI Agents' Interactions

Arcade.dev announced a $12 million seed funding round to build a secure authentication and integration platform for AI agents. The round was led by Laude Ventures with participation from Flybridge Ventures, Hanabi Capital, Neotribe, and several angel investors. Arcade.dev aims to solve the security challenges of AI interacting with the real world. Image attribution: Image provided by