Microsoft Launches New Model LAM: Enabling AI to Perform Real Actions and Execute Tasks Independently in Word

AIbase基地

Published inAI News · 5 min read · Jan 3, 2025

668

Recently, Microsoft's research team launched an artificial intelligence technology known as the "Large Action Model" (LAM), marking a new phase in AI development. Unlike traditional language models like GPT-4o, LAM can autonomously operate Windows programs, meaning that AI can not only converse or provide suggestions but can actually perform tasks.

The advantage of LAM lies in its ability to understand various user inputs, including text, voice, and images, and then convert these requests into detailed action plans. LAM can not only create plans but also adjust its action strategies based on real-time situations. The process of building LAM primarily involves four steps: first, the model learns to break tasks down into logical steps; next, it learns how to translate these plans into specific actions using more advanced AI systems (like GPT-4o); then, LAM independently explores new solutions, even tackling problems that other AI systems cannot address; finally, it undergoes fine-tuning training through a reward mechanism.

In experiments, the research team built a LAM model based on Mistral-7B and tested it in a Word environment. The results showed that the model successfully completed tasks with a probability of 71%, whereas GPT-4o had a success rate of 63% without visual information.

Moreover, LAM also excelled in task execution speed, completing each task in just 30 seconds, while GPT-4o took 86 seconds. Although GPT-4o's success rate improved to 75.5% when handling visual information, LAM demonstrated significant advantages in both speed and effectiveness overall.

To construct the training data, the research team initially collected 29,000 pairs of tasks and plans, sourced from Microsoft documents, wikiHow articles, and Bing searches. They then utilized GPT-4o to convert simple tasks into complex ones, expanding the dataset to 76,000 pairs, an increase of 150%. Ultimately, about 2,000 successful action sequences were included in the final training set.

Despite LAM's potential in AI development, the research team still faces challenges such as the possibility of AI actions going awry, regulatory issues, and technical limitations in scaling and adapting to different applications. However, researchers believe that LAM represents a significant shift in AI development, indicating that intelligent assistants will be able to assist humans more actively in completing real tasks.

Key Points:
🌟 LAM can autonomously execute Windows programs, breaking the limitation of traditional AI that only converses.
⏱️ In the Word test, LAM achieved a task completion probability of 71%, higher than GPT-4o's 63%, with faster execution speed.
📈 The research team enhanced the model's training effectiveness by expanding the number of task-plan pairs to 76,000.

Mistral Seeks $1 Billion in Funding to Target the Throne of AI in Europe!

French AI company Mistral is seeking $1 billion in equity financing, with a valuation of $6.51 billion. The company is known for its open-source large language model and chatbot Le Chat, and has raised a total of $1.19 billion in funding so far. This round of financing will be used for research and development and market expansion. Additionally, it will collaborate with MGX Fund and NVIDIA to build the largest AI data center park in Europe, supporting France's AI sovereignty initiative. Mistral's development will enhance Europe's position in the global AI competition.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Microsoft Launches New Model LAM: Enabling AI to Perform Real Actions and Execute Tasks Independently in Word

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Alibaba Tongyi Opens Source Audio Generation Model ThinkSound; Google Veo3 Generates Images into Videos; Feishu Announces Several New AI Products

Hong Kong's First AI Q&A System Launches, Taking You to Explore the Intelligent Era

Mistral Seeks $1 Billion in Funding to Target the Throne of AI in Europe!

Lark Launches Multiple AI New Products to Help Enterprises Build a Smart Office Ecosystem!

Hugging Face Launches SmolLM3: A 3B-Parameter Small Model Competes with 4B Giants, 128K Context Leads a New Trend in Efficient AI!

Vidu Q1 Shock Upgrade: Reference to Video Supports Up to Seven Images, AI Video Generation Sets New Records

Feishu Launches Multiple AI Products and Builds an Enterprise-Level Doubao

Apple is developing an AI customer service assistant similar to ChatGPT to enhance user support experience

Zhiyuan Robot Announces Patent Related to Robot Motion Control Model

Moonvalley Releases Marey Realism v1.5: Native 1080P AI Video Model, Zero Copyright Risk Leading the Industry Trend!