Researchers from institutions such as the Hong Kong University of Science and Technology and the University of Science and Technology of China have recently released the GameGen-X model, a diffusion transformer designed specifically for generating and interactively controlling open-world game videos.
GameGen-X can autonomously generate open-world game videos, simulate various game engine functionalities, including creating innovative characters, dynamic environments, complex actions, and diverse events, and can also interact with you, providing the thrill of being a game designer.
A major highlight of GameGen-X is its interactive controllability. It can predict and alter future content based on the current game segment, thereby simulating gameplay.
Users can influence the generated content through multimodal control signals, such as structured text commands and keyboard controls, to achieve control over character interactions and scene content.
To train GameGen-X, researchers also constructed the first large-scale open-world game video dataset, OGameData. This dataset includes over 1 million video clips from more than 150 games and utilizes GPT-4o to generate informative text descriptions.
The training process of GameGen-X is divided into two stages: pre-training of the base model and instruction fine-tuning. In the first stage, the model is pre-trained through text-to-video generation and video continuation tasks, enabling it to produce high-quality, long-sequence open-domain game videos.
In the second stage, to achieve interactive controllability, researchers designed the InstructNet module, which integrates experts in multimodal control signals related to gaming.
InstructNet allows the model to adjust its latent representations based on user input, unifying character interaction and scene content control for the first time in video generation. During instruction fine-tuning, only the InstructNet is updated, while the pre-trained base model remains frozen, ensuring that the model integrates interactive controllability without compromising the diversity and quality of generated video content.
Experimental results show that GameGen-X excels in generating high-quality game content and provides excellent control over environments and characters, outperforming other open-source and commercial models.
Of course, this AI is still in its early stages and has a long way to go before it can truly replace game designers. However, its emergence undoubtedly brings new possibilities to game development. It offers a new approach to game content design and development, showcasing the potential of generative models as auxiliary tools to traditional rendering technologies, effectively integrating creative generation with interactive functions, and bringing new possibilities to future game development.
Project link: https://gamegen-x.github.io/