One Week of Data Beats Seven Years of Training? Microsoft's WHAMM Model Generates a Playable Quake II Demo in Real-Time

AIbase基地

Published inAI News · 7 min read · Apr 7, 2025

Recently, tech giant Microsoft unveiled a remarkable research project—WHAMM (World and Human Action MaskGIT Model). This innovative AI model can generate and run the classic game Quake II entirely within the AI model itself, rendering a playable version in real-time. This research, part of Microsoft's Copilot Labs, aims to explore the potential and boundaries of generative AI in interactive media.

Revolutionizing Tradition: AI Models Directly Generate Playable Games

Unlike previous game AIs that primarily focused on controlling game characters or generating snippets of game content, WHAMM's uniqueness lies in its ability to generate the entire game environment and dynamic processes from scratch, responding to player actions in real-time. This means players can directly interact with the Quake II world "imagined" by the AI model, such as moving, jumping, shooting, and placing objects. This AI-generated demo version can also save player-made environmental changes and allows exploration of hidden areas.

WHAMM is part of Microsoft's "Muse" model family, which focuses on providing generative AI tools for game development. The previous version, WHAM-1.6B, was trained on the game Bleeding Edge but achieved only about one frame per second. WHAMM represents a significant leap in performance, generating over ten frames per second, enough to support real-time interaction within the model.

Technological Breakthrough: Less Data, Faster Generation

WHAMM's success stems from two key technological innovations: significantly reduced training data and a novel technical strategy. Compared to WHAM-1.6B, which used seven years of game data for training, WHAMM only requires one week's worth of Quake II game data collected from a single level. This data, recorded by professional testers, provides high-quality, targeted examples of game behavior, allowing the model to learn more efficiently.

Technically, WHAMM abandons the autoregressive approach (generating image tokens one by one) used by WHAM-1.6B, adopting a MaskGIT strategy instead. This method allows the model to generate all image tokens in parallel across multiple iterations. This change significantly improves generation speed and increases output resolution from 300×180 pixels to 640×360 pixels.

The WHAMM system's workflow is divided into three stages: first, ViT-VQGAN converts images into tokens; then, a "backbone" Transformer with about 500 million parameters predicts what will happen next based on context; finally, a smaller "refinement" module with 250 million parameters refines the predicted image tokens through multiple iterations. To generate new frames, the model uses the previous nine image-action pairs as context.

Limitations Remain: Exploring the Future of AI Game Development

While WHAMM demonstrates exciting potential, it doesn't perfectly replicate the original Quake II. Due to limitations in the training dataset, the generated environment is approximate, leading to some technical shortcomings. For example, enemy characters appear blurry, combat lacks realism, and health indicators are unreliable. Additionally, objects disappear if they remain off-screen for more than 0.9 seconds (the model's context window limitation). Playable areas are limited to a segment of the level, and the simulation stops once the end of that area is reached. Also, input lag remains relatively high, with a noticeable delay between player actions and system responses.

Microsoft views WHAMM as an experimental foundation for future AI-assisted game development. It also represents one of many emerging tools currently exploring how to apply generative AI to game development. Other similar attempts include GameGen-O (focused on generating open-world simulations), and Google and Deepmind's GameNGen and DIAMOND (used to simulate games like DOOM and Counter-Strike). While these models have made significant progress, they still face technical limitations such as low-resolution output, limited memory, and context awareness.

The Gaming Industry Embraces AI: Potential for Cost Reduction and Efficiency Improvement

The gaming industry is particularly receptive to generative AI because it blends multiple disciplines—code, design, storytelling, and multimedia—and development cycles are often constrained by budget and time. This combination of creative complexity and resource pressure makes game production particularly amenable to tools that can partially automate structured tasks.

Summary

Microsoft's WHAMM model, by generating a playable Quake II demo in real-time within an AI model, showcases the immense potential of generative AI in interactive entertainment. Although some limitations remain, WHAMM's technological breakthroughs, such as more efficient data learning and parallel image generation strategies, pave new avenues for future AI-driven game development.

IDC: China's AI Investment to Exceed $100 Billion by 2028

International Data Corporation (IDC) released the "Worldwide Artificial Intelligence and Generative AI Spending Guide," projecting that China's total AI investment will surpass $100 billion by 2028, with a compound annual growth rate (CAGR) of 35.2%. According to IDC, global spending on AI IT is expected to reach $3,158 billion in 2024 and $8,159 billion by 2028, representing a CAGR of 32.9%. In the generative AI sector, the global market is expected to grow significantly over the next five years.

ByteDance Unveils DreamActor-M1: Replicating RunwayML Act Functionality and Pushing Animation Generation Boundaries

ByteDance recently announced its latest AI project, DreamActor-M1, a cutting-edge advancement in video generation technology. This model seamlessly replaces a person from a still image into a video scene using a reference video, generating dynamic imagery with fine-grained expressions, natural movements, and high-definition quality. This launch marks another breakthrough for ByteDance in generative AI and is seen as a challenge to existing animation generation tools (like RunwayML).

ByteDance Unveils DreamActor-M1 Project, Challenging Runway Act-One's AI Character Animation Technology

ByteDance recently launched its new AI project, DreamActor-M1. This project aims to replicate the functionality of Runway Act-One, utilizing advanced generative AI technology to transform character performances in videos into virtual animations with improved accuracy and expressiveness. This news has quickly garnered widespread attention from the industry and netizens, seen as another significant step forward for ByteDance in the AI video generation field. Technological Breakthrough: Ambition to Surpass Runway Act-One. According to publicly available information, Drea...

Reply and AWS Partner to Drive Generative AI Innovation

Reply, a leading global systems integrator and consulting firm, recently announced a multi-year strategic collaboration with Amazon Web Services (AWS) to accelerate innovation and adoption of generative AI (GenAI). This collaboration will empower global enterprises to leverage the potential of AI, utilizing advanced cloud infrastructure and AI-powered capabilities. Reply, along with its subsidiaries Data Reply and Storm Reply, will work with AWS to develop industry solutions that improve process efficiency and productivity.

Qualcomm Acquires Vietnamese AI Company MovianAI to Boost Generative AI Development

Qualcomm recently announced the completion of its acquisition of MovianAI, a Vietnamese artificial intelligence research company. While the exact amount of the transaction remains undisclosed, this move has drawn considerable attention within the industry. MovianAI was originally the generative AI division of VinAI, a subsidiary of the Vietnamese conglomerate Vingroup. This acquisition signifies Qualcomm's continued expansion in AI technology and will further strengthen its global competitiveness. Following the acquisition, MovianAI's founder and CEO Hu...

GPT-4o's Image Generation Capabilities Rank Among the Best: Strong Performance Across Multiple Domains Challenges AI Creativity Limits

Recently, the field of artificial intelligence has been abuzz with discussion surrounding OpenAI's GPT-4o image generation model. Its exceptional performance has propelled it to the forefront in industry benchmark evaluations. According to recent social media discussions, GPT-4o's ELO score for image generation quality places it in first place alongside emerging model Reve, surpassing strong competitors such as Recraft V3, FLUX1.1[pro], and Google's Gemini2.0Flash. This achievement solidifies OpenAI's position in the generative AI field.

Krea Launches 3D Capabilities and Website Redesign: From Text to 3D Creations in Seconds

Generative AI platform Krea recently announced the launch of its 3D generation capabilities and a complete website redesign. This marks a double breakthrough for Krea in technological innovation and user experience, further solidifying its leading position in the creative tools field. The newly launched 3D generation functionality is the core highlight of this update. Users can quickly generate interactive 3D objects from text descriptions or 2D images, adjusting angles, lighting, and textures in real-time. This functionality is based on Krea's proprietary AI model and internal...

Global Generative AI Spending to Reach $644 Billion by 2025

According to a recent Gartner report, global spending on generative AI is projected to reach $644 billion by 2025, a 76.4% increase from 2024. This significant growth reflects the rising adoption of generative AI by businesses. The report indicates that hardware will constitute a substantial portion of the generative AI spending in 2025. (Image generated by AI; image licensing provided by Midjourney)