On February 26th, Google Deep Mind team unveiled the foundational world model Genie, a virtual, interactive environment generated with 11 billion parameters. Trained by feeding video data into the model, Genie can produce images, sketches, and even manipulable virtual worlds. Unlike traditional AI, which requires human-labeled images for training, Genie is trained without any action labels, meaning it must identify features and patterns of different actions autonomously from videos. The videos generated by Genie are cartoonish, capable of simulating robot movements and transforming objects, leaning more towards animated gifs compared to Sora. Google states that Genie is challenging and capable of learning fine-grained control, learning from internet videos. Genie can also simulate a variety of potential actions, deducing different movements based on the generated environment.