On February 26th, Google Deep Mind team unveiled the foundational world model Genie, a virtual, interactive environment generated with 11 billion parameters. Trained by feeding video data into the model, Genie can produce images, sketches, and even manipulable virtual worlds. Unlike traditional AI, which requires human-labeled images for training, Genie is trained without any action labels, meaning it must identify features and patterns of different actions autonomously from videos. The videos generated by Genie are cartoonish, capable of simulating robot movements and transforming objects, leaning more towards animated gifs compared to Sora. Google states that Genie is challenging and capable of learning fine-grained control, learning from internet videos. Genie can also simulate a variety of potential actions, deducing different movements based on the generated environment.
Google Releases Base World Model Genie with 11 Billion Parameters

雪球App
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.