The highly anticipated GPT-5 project (codenamed Orion) has been in development for over 18 months but has yet to be released. According to a recent report from The Wall Street Journal, insiders have revealed that despite Orion's performance exceeding that of OpenAI's existing models, the advancements are not sufficient to justify the continued investment of substantial costs. Even more concerning is the global shortage of data, which may be the biggest obstacle to GPT-5 reaching a higher level of intelligence.
It is reported that GPT-5 has undergone at least two training sessions, each revealing new issues that did not meet researchers' expectations. Each round of training takes several months, with costs reaching up to $500 million. The project's success and its timeline remain uncertain.
Challenges in Training: Data Bottleneck Emerges
Since the release of GPT-4 in March 2023, OpenAI has begun developing GPT-5. Typically, the capabilities of AI models improve as the volume of data they consume increases. The training process requires massive amounts of data, takes months, and relies on numerous expensive computing chips. OpenAI CEO Altman has revealed that the cost of training GPT-4 exceeded $100 million, and future AI model training costs are expected to surpass $1 billion.
To mitigate risks, OpenAI usually conducts small-scale trials to validate model feasibility. However, the development of GPT-5 has faced challenges from the outset. In mid-2023, OpenAI launched an experimental training called "Arrakis" to test the new design of GPT-5. Yet, the training progressed slowly and was costly, with experimental results indicating that the development of GPT-5 is more complex and difficult than originally anticipated.
As a result, OpenAI's research team decided to make a series of technical adjustments to Orion and realized that the existing public internet data could no longer meet the model's needs. To enhance GPT-5's performance, they urgently require a wider variety of higher-quality data.
"Creating Data from Scratch": Addressing the Data Shortage
To tackle the issue of insufficient data, OpenAI has decided to "create data from scratch." They have hired software engineers and mathematicians to write new software code or solve mathematical problems, allowing Orion to learn from these tasks. OpenAI also engages these experts to explain their work processes, converting human wisdom into knowledge that machines can learn.
Many researchers believe that code, as the language of software, can help large models solve problems they have not encountered before. Jonathan Siddharth, CEO of Turing, stated, "We are transferring human wisdom from the human brain to the machine brain."
OpenAI has even collaborated with experts from fields like theoretical physics to explain how to solve challenges in their respective domains. However, this "creating data from scratch" approach is not very efficient. The training data for GPT-4 amounted to approximately 13 trillion tokens, and even with 1,000 people writing 5,000 words daily, it would take months to produce 1 billion tokens.
To accelerate training, OpenAI has also attempted to use AI-generated "synthetic data." However, research has shown that using AI-generated data for AI training can sometimes lead to errors or nonsensical outputs. In response, OpenAI scientists believe that using data generated by o1 can help avoid these issues.
Internal and External Challenges: OpenAI Faces Multiple Obstacles
OpenAI is not only facing technical challenges but also internal turmoil and talent poaching from competitors. Additionally, the dual pressures of technology and funding are increasing. Each training session costs up to $500 million, and the final training costs are likely to exceed $1 billion. Meanwhile, competitors like Anthropic and Google are also launching next-generation models in an attempt to surpass OpenAI.
Talent loss and internal disagreements have further slowed development progress. Last year, the OpenAI board abruptly fired Altman, leading some researchers to question the company's future. Although Altman was quickly reinstated as CEO and began reforming the corporate governance structure, more than 20 key executives, researchers, and long-term employees, including co-founder and chief scientist Ilya Sutskever and technical lead Mira Murati, have left since the beginning of this year.
As the Orion project stalls, OpenAI has started developing other projects and applications, including a simplified version of GPT-4 and an AI video generation product called Sora. However, this has resulted in competition for limited computing resources among different teams, particularly intense competition between the new product development team and the Orion research team.
Bottlenecks in AI Development? The Industry Faces Deep Reflection
The predicament of GPT-5 may reveal a larger industry question: Is AI approaching a "bottleneck" in development? Industry insiders point out that the strategy of relying on massive data and larger models is gradually failing. Former OpenAI scientist Sutskever has stated, "We have only one internet," and the growth of data is slowing down, with the "fossil fuel" that has driven AI leaps gradually depleting.
Regarding the future of GPT-5, Altman has not provided a clear timeline. We still cannot determine when or if OpenAI will launch a model worthy of being called GPT-5. This dilemma surrounding GPT-5 has also sparked deep reflection on the future direction of AI development.