With the rapid development of artificial technology, humanity seems to be getting closer to the virtual scenes depicted in the movie "The Matrix." Recently, an AI-based world simulator named "The Matrix" was officially launched. This innovative achievement was developed by an entirely Chinese team and can infinitely generate high-fidelity 720p real scene videos, supporting real-time interaction.
The simulator showcased a 14-minute demonstration video, but it can actually generate content continuously for up to an hour, covering various scenes such as deserts, grasslands, bodies of water, and urban environments. During the experience, users can control the scenes in real-time using the WA, S, D keys on their keyboard, experiencing dynamic visuals at 16 frames per second.
The development team of the "The Matrix" project includes members from Alibaba, the University of Hong Kong, the University of Waterloo, and the Canadian AI research institution Vector Institute. They named it "The Matrix" inspired by a classic line from the movie: This is the world you know; it now exists only in what we call the Matrix, a neural interactive simulation system.
Prompt: admin@matrix: The scene depicts an urban environment where a long, straight road stretches beneath an elevated highway or bridge, flanked by fences indicating construction or restricted access. The street is marked with two yellow lines, and massive concrete pillars support the roadway above, casting shadows below. On the left wall, red digital numbers are visible, possibly used for monitoring or alerts, accompanied by construction materials and barricades, signifying active development. On the right side, infrastructure and a neon blue 'PAWN SHOP' sign indicate nearby commercial activity. Beyond the overpass, the road leads to tall modern buildings, their illuminated windows showcasing the vibrancy of the city landscape. Streetlights and digital displays provide limited lighting, adding to the futuristic feel. Despite signs of activity, the road is devoid of vehicles or pedestrians, contributing to a sense of silence. The portion of the sky outside the bridge contrasts with the shadows cast beneath it, while the surrounding construction and advanced architecture create an atmosphere of a city that is both evolving and futuristic.
The core highlight of this project is its unprecedented frame-level control, allowing every user action to receive an instant response, making it feel as if they are truly present. Users can experience driving a car through various scenes such as deserts, forests, or cities from either a first-person or third-person perspective. By training on data from AAA games like "Forza Horizon 5" and "Cyberpunk 2077," this system can generate scenes that are nearly indistinguishable from reality. More importantly, users can enjoy continuous video experiences, seamlessly transitioning through different environments.
In addition to being able to generate unlimited video and high-quality visuals, "The Matrix" also possesses zero-shot generalization capabilities. This means the simulator can understand and predict the behavior and interaction of objects in different environments without the corresponding training data.
The training data for the simulation primarily comes from supervised data from three AAA games and a large amount of unsupervised video from real scenes. Unlike previous research, the innovation of this technology lies in its learning ability, allowing it to generate accurately in unseen environments.
For example, the simulator can display scenes of a "BMW X3 driving in an environment" or the marvelous image of a "car swimming in water." From a technical perspective, the core of "The Matrix" consists of three modules: an interaction module, a window denoising process model, and a flow consistency model. The interaction module is responsible for understanding user input and integrating it into video generation, while the window denoising process model makes the generation of long videos feasible, addressing the bottleneck of traditional models in long sequence generation. Finally, the integration of the flow consistency model significantly enhances inference speed, achieving real-time generation.
Project leaders Hongyang Zhang and Ruili Feng stated that they will continue to promote the development of this technology in the future, striving to provide users with a more realistic virtual experience.
Prompt: The video features a close-up of a woman inside a car, wearing oversized sunglasses and dressed in black.
Key Points:
🌐 The AI version of "The Matrix" simulator is launched, supporting unlimited generation of 720p videos.
🎮 Users can control video scenes in real-time, experiencing dynamic visuals at 16 frames per second.
🚀 This technology has zero-shot generalization capability, enabling it to predict the behavior of objects in different environments.