Byte's Joint Research with Tsinghua: AI Video Models like Sora Cannot Understand Basic Physical Laws

AIbase基地

Published inAI News · 6 min read · Nov 18, 2024

383

Recently, researchers from ByteDance Research Institute and Tsinghua University jointly released a new study, indicating that current AI video generation models, such as OpenAI's Sora, while capable of creating stunning visual effects, have significant flaws in understanding basic physical laws. This study has sparked widespread discussions about the capabilities of AI in simulating reality.

The research team tested AI video generation models under three different scenarios: predictions under known patterns, predictions under unknown patterns, and new combinations of familiar elements. Their goal was to determine whether these models have truly learned physical laws or are merely relying on surface features from their training.

Through testing, the researchers found that these AI models did not learn universally applicable rules. Instead, they primarily relied on surface features such as color, size, speed, and shape when generating videos, following a strict hierarchy: color takes precedence, followed by size, speed, and shape.

In familiar scenarios, these models performed almost perfectly, but they became helpless when faced with unknown situations. One test in the study demonstrated the limitations of the AI models in handling object motion. For example, when the model was trained with a fast-moving sphere oscillating back and forth, it displayed a sudden change in direction when presented with a slow-moving sphere during testing, which was clearly evident in the related video.

The researchers pointed out that simply increasing the model size or adding more training data does not solve the problem. Although larger models perform better under familiar patterns and combinations, they still fail to understand basic physical laws or handle scenarios outside their training range. Co-author Kang Bingyi mentioned, “If the data coverage is good enough in specific scenarios, it might form an overfitted world model.” However, such a model does not meet the true definition of a world model, as a genuine world model should be able to generalize beyond the training data.

Co-author Bingyi Kang demonstrated this limitation on X, explaining that when they trained the model with a fast-moving ball moving left to right and back, and then tested it with a slow-moving ball, the model showed the ball suddenly changing direction after just a few frames (you can see this at 1 minute and 55 seconds in the video).

The findings of this study pose a challenge to OpenAI's Sora project. OpenAI has claimed that Sora has the potential to evolve into a true world model through continuous expansion and even asserted that it has a basic understanding of physical interactions and three-dimensional geometry. However, researchers pointed out that merely scaling up is insufficient for video generation models to discover fundamental physical laws.

Yann LeCun, head of AI at Meta, also expressed skepticism, stating that generating pixels to predict the world is "a waste of time and doomed to fail." Nevertheless, many people still look forward to OpenAI's anticipated release of Sora in mid-February 2024, showcasing its video generation potential.

Key Points:
🌟 The study found that AI video generation models have significant flaws in understanding physical laws, relying on surface features of training data.
⚡ Increasing model size does not solve the problem; these models perform poorly in unknown scenarios.
🎥 OpenAI's Sora project faces challenges, as simply scaling up cannot achieve a true world model.

AI Video Generation ByteDance OpenAI Tsinghua University

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Liquid AI Opensources LFM2: The New King of Edge AI, Achieving Breakthroughs in Speed and Efficiency!

Liquid AI opensources the next-generation edge AI model LFM2, available in three versions with 350M to 1.2B parameters. The model features an innovative architecture, achieving twice the inference speed and three times the training efficiency on edge devices, supporting 32K long context processing. LFM2 performs exceptionally well in tasks such as instruction following, outperforming models of similar scale, making it particularly suitable for privacy-sensitive scenarios. Fully open-sourced through Hugging Face, this marks the first time a U.S. company has surpassed Chinese open-source models in the field of efficient small models. Liquid AI

Jul 14, 2025

OpenAI Delays First Open-Source Large Model Release, Ensuring Safety Becomes Top Priority

OpenAI announced the postponement of its first open-source large model release, with CEO Sam Altman stating that more time is needed for safety testing and risk assessment. This new model, which has performance comparable to o3-mini, may be named 'Open Model,' but the extent of its openness remains unclear. Research Vice President Aidan Clark emphasized that the company maintains strict standards for open source, as the model cannot be recalled once released. Although the delay disappointed some users, OpenAI believes ensuring safety and taking a responsible approach is more important. This decision will shape the future of models.

Jul 14, 2025

130

China's AI Governance Plan Shines at the UN Summit, Beating Over 60% of Deepfake Attacks

The UN AI for Good Summit was held in Geneva, where Peng Jin from Ant Group shared China's achievements in AI security technology. Data shows that Ant Digital helped Southeast Asian banks reduce fake face attack rates from 10% to 4%, with an identification accuracy rate of 99.9%. Ant provides financial-grade identity authentication through the ZOLOZ platform, serving 25 countries, and has opened a dataset of 1.8 million fake samples to promote industry research. China's technological solutions are offering important references for global AI safety governance.

Jul 14, 2025

AI Chatbot Becomes a Virtual Friend, Experts Worry About Impact on Children's Social Development

UK study: 67% of teens aged 9-17 view AI chatbots as 'friends', with 12% relying on them due to lack of real social connections. AI mimics human emotions, potentially blurring human-machine boundaries. Experts warn of psychological risks and urge regulations to protect youth mental health.....

Jul 14, 2025

New AI Time Travel Gameplay is Trending! See What a 12-Year-Old Looks Like at 23?

AI's 'time travel' trend thrives as ChatGPT transforms childhood photos. TikTok's AI effect drew 170K users, but results vary: Musk's image was unrecognizable, Asian stars distorted, while Eddie Peng fared slightly better. Experts note AI predicts general trends, not individual changes, yet this playful tech sparks social media buzz.....

Jul 14, 2025

OpenAI Postpones Open-Source Large Model Release, Prioritizes Safety Testing

OpenAI announced the postponement of the open-source large model release. CEO Sam Altman stated that more time is needed for safety testing. The model was originally scheduled to be released this week but is now delayed until next week to ensure its safety and reliability. Altman emphasized that once the model is released, it cannot be recalled and must be handled with caution. This is OpenAI's first attempt to release a downloadable self-running model, aimed at providing powerful tools for researchers and small businesses. Although the delay is disappointing, the community generally understands the importance of safety testing and believes it is crucial for the AI ecosystem.

Jul 14, 2025

Goldman Sachs Introduces AI New Employee Deutsch, Opening the Era of Intelligent Finance

Goldman Sachs introduced AI coding assistant 'Devin' to boost efficiency, planning hundreds of deployments. The hybrid human-AI approach enhances productivity, though AI won't replace developers.....

Jul 14, 2025

United Nations affiliated organizations launch AI refugee virtual characters to enhance public awareness of refugee issues

The United Nations University research team developed two AI virtual characters - Amina, a Sudanese refugee, and Abdullah, a rebel fighter - to raise public awareness of the refugee crisis through dialogue. The project was experimentally conducted by an academic team and is not an official United Nations project. Although researchers envisioned using it for fundraising presentations, test users provided negative feedback, stating that real refugees can already speak for themselves. The relevant website is currently unavailable.

Jul 14, 2025

PixVerse AI Video Creation Platform Launches Multi-Keyframe Generation Feature

On July 11, PixVerse AI video creation platform, which has surpassed 60 million global users, announced a major feature upgrade — the addition of the 'Multi-Keyframe Generation' function in the Start-End Frame module. This marks a new stage in AI video creation, transitioning from the generation of single segments to narrative expression. Users can now upload up to 7 images as keyframes via the web version's start-end frame feature, and the AI will automatically analyze the semantic relationships between frames, intelligently building smooth action and scene transition paths. This technological breakthrough enables static images to be presented dynamically.

Jul 14, 2025

New Breakthrough in Real-Time Video Generation: Meta StreamDiT Can Generate High-Quality Videos Frame by Frame with a Single GPU

Meta and Berkeley developed StreamDiT for real-time AI video generation: 1) 16fps 512p on single GPU, 4B-param model creates 1-min videos with live edits; 2) Novel buffer enables parallel processing (2 frames/0.5s) with 8-step optimization; 3) Trained on 3K HD videos + 2.6M dataset; 4) Outperforms rivals in motion smoothness, 30B-param version shows quality potential; 5) Enables interactive video despite transition flaws.....

Jul 14, 2025

100

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Byte's Joint Research with Tsinghua: AI Video Models like Sora Cannot Understand Basic Physical Laws

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Liquid AI Opensources LFM2: The New King of Edge AI, Achieving Breakthroughs in Speed and Efficiency!

OpenAI Delays First Open-Source Large Model Release, Ensuring Safety Becomes Top Priority

China's AI Governance Plan Shines at the UN Summit, Beating Over 60% of Deepfake Attacks

AI Chatbot Becomes a Virtual Friend, Experts Worry About Impact on Children's Social Development

New AI Time Travel Gameplay is Trending! See What a 12-Year-Old Looks Like at 23?

OpenAI Postpones Open-Source Large Model Release, Prioritizes Safety Testing

Goldman Sachs Introduces AI New Employee Deutsch, Opening the Era of Intelligent Finance

United Nations affiliated organizations launch AI refugee virtual characters to enhance public awareness of refugee issues

PixVerse AI Video Creation Platform Launches Multi-Keyframe Generation Feature

New Breakthrough in Real-Time Video Generation: Meta StreamDiT Can Generate High-Quality Videos Frame by Frame with a Single GPU