OpenAI announced its next-generation reasoning models—o3 and its streamlined version o3-mini—during a 12-day launch event. These models are seen as successors to the o1 series, specifically designed to engage in deeper thinking before answering questions to enhance accuracy.
The o3 model achieved excellent performance on the ARC-AGI benchmark, becoming the first AI model to surpass this benchmark, demonstrating problem-solving capabilities close to human levels. The minimum performance of the o3 series models on the ARC-AGI benchmark can reach 75.7%, and with additional computational resources, performance can be improved to 87.5%.
The o3-mini model focuses on increasing reasoning speed and reducing costs while maintaining model performance, making it particularly suitable for programming tasks. OpenAI plans to launch o3-mini around the end of January, followed shortly by the full o3 model. Although the o3 series models will not be publicly released directly and will undergo safety testing first, OpenAI has begun allowing safety researchers to register for preview access to o3 and o3-mini.
In programming and mathematical problem-solving, the o3 model has demonstrated significant capabilities. On the SWE-bench Verified benchmark, o3 achieved an accuracy of about 71.7%, over 20% higher than the o1 model. In Competition Code, o3 scored 2727 Elo points, while o1 only scored 1891. Additionally, o3's accuracy in competitive mathematics reached 96.7%, and its accuracy on GPQA Diamond reached 87.7%, nearly 10% higher than o1.
OpenAI also introduced a new safety assessment method—deliberative alignment, which is a new paradigm for directly teaching models safety standards. This method trains models to explicitly recall standards before answering and accurately perform reasoning. This approach has been used to align OpenAI's o series models, achieving high precision in adhering to OpenAI's safety policies.
Currently, OpenAI is advancing external safety testing and has opened early access applications on its website. Applicants must fill out an online form and provide relevant information. Selected researchers will be granted access to o3 and o3-mini to explore their capabilities and contribute to safety assessments.