Indeed the Strongest! OpenAI's New Model o3 Sets a Record Score in ARC-AGI Benchmark Test

AIbase基地

Published inAI News · 5 min read · Dec 25, 2024

296

OpenAI's latest model, o3, has achieved remarkable results in the ARC-AGI benchmark test, scoring as high as 75.7% under standard computing conditions, while the high-computation version reached an impressive 87.5%. This achievement has surprised the AI research community, but it still does not prove that artificial general intelligence (AGI) has been achieved.

The ARC-AGI benchmark test is based on the Abstract Reasoning Corpus, which aims to evaluate the ability of AI systems to adapt to new tasks and demonstrate fluid intelligence. The ARC consists of a series of visual puzzles that require understanding basic concepts such as objects, boundaries, and spatial relationships. Humans can easily solve these puzzles, while current AI systems face significant challenges in this area. The ARC is considered one of the most challenging standards in AI assessment.

o3's performance significantly surpasses previous models. The highest scores for the o1-preview and o1 models on ARC-AGI were only 32%. Previously, researcher Jeremy Berman achieved a score of 53% by combining Claude3.5Sonnet with genetic algorithms, while the emergence of o3 is viewed as a leap in AI capabilities.

François Chollet, the creator of the ARC, praised o3 for its transformative capabilities in AI, believing it has reached an unprecedented level of adaptability to new tasks.

Despite o3's outstanding performance, its computational costs are quite high. Under low-computation configurations, the cost of solving each puzzle ranges from $17 to $20, requiring 33 million tokens; under high-computation configurations, the computational cost increases to 172 times, using billions of tokens. However, as reasoning costs gradually decrease, these expenses may become more manageable.

There is currently no detailed information on how o3 achieved this breakthrough. Some scientists speculate that o3 may have used a program synthesis approach, combining chain thinking and search mechanisms. Others believe that o3 might simply be a further extension of reinforcement learning.

Although o3 has made significant progress on ARC-AGI, Chollet emphasizes that ARC-AGI is not a test for AGI, and o3 has not yet met AGI standards. It still performs poorly on some simple tasks, revealing fundamental differences from human intelligence. Additionally, o3 continues to rely on external validation during reasoning, which is far from the independent learning capabilities of AGI.

Chollet's team is developing new challenging benchmarks to test o3's capabilities, which are expected to lower its score to below 30%. He pointed out that true AGI would mean creating tasks that are simple for ordinary people but nearly impossible for AI.

Key Points:
🌟 o3 achieved a high score of 75.7% in the ARC-AGI benchmark test, outperforming previous models.
💰 The cost of o3 solving each puzzle reaches up to $17 to $20, with substantial computational demands.
🚫 Despite o3's excellent performance, experts emphasize that it has not yet reached AGI standards.

OpenAI Invests $6.5 Billion to Acquire AI Company Ivy, Enters the Hardware Market!

OpenAI acquires io Products, an AI equipment company founded by Jony Ive, former design director of Apple, through a stock-only deal worth $6.5 billion, officially entering the hardware field. This acquisition brings OpenAI a top team that previously worked on the design of the iPhone. Jony Ive will be deeply involved in product design. Despite previous setbacks due to trademark disputes, this remains OpenAI's largest acquisition to date. CEO Sam Altman stated that this move will drive the integration of AI technology and hardware, and future innovative AI devices will be launched. The acquisition marks OpenAI's step into the tech scene.

OpenAI is about to launch a revolutionary AI browser, competing with Google Chrome

OpenAI plans to launch an AI browser to challenge Google Chrome. The product is based on Chromium and integrates ChatGPT technology, with 400 million potential users. Its innovation lies in the AI agent feature that can automatically perform web operations, reducing traditional browsing steps. This move may threaten Google's 66% market share and its advertising ecosystem. The industry is currently experiencing a surge in AI browser trends, and OpenAI, leveraging its technological advantages, is trying to seize the opportunity. If successful, it could undermine Google's dominant position in user data and online advertising.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Indeed the Strongest! OpenAI's New Model o3 Sets a Record Score in ARC-AGI Benchmark Test

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Announces the Latest Class of Students at the American Artificial Intelligence Infrastructure Institute

OpenAI Subtly Adds Shopify as a Search Partner, Strengthening ChatGPT Shopping Search Functionality

NVIDIA stellt DiffusionRenderer vor: Ein neues KI-Modell zur Erstellung von realistischen 3D-Szenen aus Videos

Google Veo3 Adds Image-to-Video Feature, Users Create Over 40 Million Videos Within Seven Weeks

OpenAI Plans to Release Open-Weight Models, Breaking the Closed-Source Convention

AI API Showdown in the First Half of 2025: Gemini Dominates, DeepSeek Makes a Surprise Rise, Why Did OpenAI Fall Behind?

OpenAI Acquires AI Hardware Company Founded by Aivi, Transaction Amount Near 6.5 Billion Dollars

OpenAI Invests $6.5 Billion to Acquire AI Company Ivy, Enters the Hardware Market!

OpenAI is about to launch a revolutionary AI browser, competing with Google Chrome

AI Daily: Alibaba Tongyi Opens Source Audio Generation Model ThinkSound; Google Veo3 Generates Images into Videos; Feishu Announces Several New AI Products