Can artificial intelligence models truly memorize, think, plan, and reason like humans? Some AI labs seem to suggest that we are not far from achieving "human-level AI," but Yann LeCun, Meta's chief AI scientist, has thrown cold water on this notion. He believes that achieving this goal may still require a decade of effort, with the key being the "world model."
Earlier this year, OpenAI introduced a new feature claiming that its ChatGPT could "remember" conversations with users. Its latest model displays the word "thinking" when generating outputs and even claims to be capable of "complex reasoning."
Image source: Picture generated by AI, authorized service provider Midjourney
It sounds like we are on the brink of the era of AGI (artificial general intelligence). However, at a recent Hudson Forum, LeCun refuted the optimistic views of figures like Elon Musk, founder of xAI, and Shane Legg, co-founder of Google DeepMind, who believe that human-level AI is imminent.
LeCun pointed out: "We need machines that can understand the world; machines with memory, intuition, common sense, and the ability to reason and plan like humans." He emphasized that despite the frequent calls from the most enthusiastic proponents of AI development, current AI systems are still far from this level. He even stated that true human-level AI might take "several years to several decades" to achieve.
So, what's the problem? It's quite simple: today's large language models (LLMs) merely work by predicting the next word (usually a few letters or a short word), while current image or video models predict the next pixel. This means that language models can only make predictions in one dimension, and image/video models in two dimensions. Although these models perform quite well in their respective fields, they do not understand the complexity of the three-dimensional world.
As a result, modern AI systems cannot accomplish many simple tasks that humans can easily handle. LeCun mentioned that humans can learn to clear the table at the age of ten and drive at seventeen, sometimes even in just a few hours. However, even the most advanced AI systems, after thousands or millions of hours of data training, still cannot reliably operate in the real world.
To achieve more complex tasks, LeCun believes we need to build three-dimensional models that can perceive the surrounding world, with the core being a new AI architecture—the world model. He explained: "A world model is a mental model of how the world behaves." You can imagine a series of actions you might take, and your world model will allow you to predict the impact of these actions on the world.
For example, imagine you see a messy bedroom and want to clean it. You can naturally think of picking up all the clothes and putting them away to solve the problem. You don't need to try multiple methods or learn how to clean a room first. Your brain observes the three-dimensional space and directly formulates an action plan that can achieve the goal immediately. This action plan is the "secret weapon" promised by the AI world model.
Another advantage of world models is that they can handle more extensive data than LLMs. This also makes their computational requirements more complex, which is why major cloud service providers are competing to collaborate with AI companies.
Currently, multiple AI labs are chasing the big concept of world models, which has quickly become a hot topic attracting venture capital. A group of prestigious AI researchers, including "AI goddess" Fei-Fei Li and Justin Johnson, have just raised $230 million for their startup, World Labs. She and her team firmly believe that world models will unlock smarter AI systems. OpenAI also describes its unreleased Sora video generator as a world model, but the details have not been disclosed.
LeCun outlined the idea of using world models to create human-level AI in a paper on "goal-driven AI" in 2022, although he noted that the concept has a history of over 60 years. In short, world models are trained with a basic representation of the world (such as a video of a dirty room) and memory. Then, the model predicts changes in the world based on this information. Next, you set goals for the world model, including the desired state of the world (like cleaning the room), and establish some "safeguards" to ensure that the model does not harm humans in achieving its goals (for example, please do not hurt me while cleaning the room). Finally, the world model finds a sequence of actions to achieve these goals.
Meta's long-term AI research lab, FAIR (Fundamental AI Research), is actively researching goal-driven AI and world models, LeCun said. FAIR has previously conducted AI research for Meta's upcoming products, but LeCun said the lab has shifted its focus in recent years to long-term AI research and no longer uses LLMs.
Although the concept of world models is fascinating, LeCun admitted that we have not made much progress in turning these systems into reality. We still have many difficult problems to solve before reaching our goal, he said, "If not ten years, everything here might take several years to operate." And his boss, Mark Zuckerberg, always can't help but ask when this goal will be achieved.