Recently, a study led by Yann LeCun, the AI Chief Scientist at Meta, revealed how artificial intelligence can develop a fundamental understanding of physics by watching videos. This research was conducted collaboratively by scientists from Meta FAIR, the University of Paris, and EHESS, demonstrating that AI systems can acquire intuitive physical knowledge through self-supervised learning without predefined rules.
The research team employed a new method called Video Joint Embedding Predictive Architecture (V-JEPA), which operates more similarly to the human brain's information processing compared to generative AI models like OpenAI's Sora. Instead of aiming for perfect pixel predictions, V-JEPA focuses on making predictions within an abstract representation space. This approach allows the AI system to learn basic physical concepts.
In the study, the team adapted an evaluation method from developmental psychology known as "violation of expectation," originally used to test infants' understanding of physics. Researchers presented the AI with two similar scenarios—one physically possible and the other physically impossible (for example, a ball passing through a wall). By measuring the AI's reactions to these physically contradictory phenomena, they assessed its understanding of physics.
V-JEPA was tested on three datasets: IntPhys (basic physical concepts), GRASP (complex interactions), and InfLevel (realistic environments). The results showed that V-JEPA excelled in object permanence, continuity, and shape consistency, while the performance of large multimodal language models like Gemini1.5Pro and Qwen2-VL-72B was nearly equivalent to random guessing.
The efficiency of V-JEPA's learning is also noteworthy, as the system only needed to watch 128 hours of video to grasp basic physical concepts. Moreover, even a small model with 115 million parameters demonstrated strong performance. The research indicates that V-JEPA can effectively recognize motion patterns and accurately identify physically unreasonable events, laying the groundwork for AI to truly understand the world.
This study challenges a fundamental assumption in many AI research projects, which posits that systems require predefined "core knowledge" to understand physical laws. The findings of V-JEPA suggest that observational learning can help AI acquire such knowledge, similar to how infants, primates, and even young birds understand physics. The research aligns with Meta's long-term exploration goals for the JEPA architecture, aiming to create comprehensive world models that enable autonomous AI systems to understand their environments more deeply.
Key Takeaways:
🧠 The study shows AI learns physics knowledge through videos without predefined rules.
📊 V-JEPA outperforms large language models in understanding physics, demonstrating stronger learning capabilities.
🌍 Meta is pushing a new direction in AI development, aiming to create more comprehensive environmental understanding models.