Claude-3 surpasses human average IQ, Anthropic leads AI intelligence into a new era

AIbase基地

Published inAI News · 9 min read · Apr 22, 2025

Anthropic's Claude-3 model achieved a breakthrough in IQ testing, exceeding the average human score of 100 for the first time, marking a milestone in AI development. According to AIbase, Claude-3 outperformed its predecessors in the Norwegian Mensa IQ test, signifying a significant leap in AI cognitive abilities. Community analysis suggests this achievement reflects Anthropic's technological prowess and sparks widespread discussion about the future of AI. Related data and predictions have been publicly shared on various tech forums, and AIbase provides an in-depth analysis.

Claude Series: A Steady Trajectory of Enhanced Intelligence

The Claude series of models showcases Anthropic's continuous progress in AI research and development. AIbase has compiled its IQ test performance and release history:

Claude-1 (March 2023): Answered 6 questions correctly, achieving an IQ of approximately 64, near random performance, laying a foundational base for subsequent optimizations.

Claude-2 (July 2023): Answered 12 questions correctly, improving its IQ to 82, an increase of approximately 18 IQ points, demonstrating significant progress in reasoning ability.

Claude-3 (March 2024): Answered 18.5 questions correctly, achieving an IQ of 101, exceeding the average human level for the first time, adding approximately 19 IQ points, showcasing strong pattern recognition and problem-solving capabilities.

The community observes a symmetrical relationship between the score increase (6-6.5 questions) and IQ improvement (18-19 points) with each model upgrade, speculating that Anthropic might optimize its model release schedule based on internal benchmarks. AIbase believes this steady progress reflects Anthropic's deep accumulation in data quality, training scale, and algorithm design.

Technical Analysis: From Matrix Tests to Cognitive Leaps

Claude-3's IQ test was based on the Norwegian Mensa's 35-question matrix-style IQ test, with questions presented textually to ensure AI participation without visual input. AIbase analysis points to key factors contributing to its success:

Enhanced Pattern Recognition: Claude-3 outperformed its predecessors in complex matrix problems (after question 18), indicating a breakthrough in multi-layered pattern processing and abstract reasoning.

Contextual Understanding: Through pre-training and Reinforcement Learning from Human Feedback (RLHF), Claude-3 can more accurately interpret the semantics of questions, reducing irrelevant assumptions.

Efficient Reasoning: Combining the Constitutional AI framework, the model demonstrates near-human fluency in logical reasoning and complex tasks.

However, AIbase notes that IQ tests are designed for human cognition, and their direct application to AI may have limitations. For example, training data contamination could affect test fairness, necessitating validation of the model's generalization ability through novel questions.

Future Predictions: The Intelligent Outlook of Claude-4 to Claude-6

Based on the Claude series' release cycle and performance improvements, the community has made bold future predictions. AIbase summarizes these as follows:

Claude-4 (Expected March-July 2025): A projected 12-16-month release cycle, answering approximately 25 questions correctly, achieving an IQ of 120 (equivalent to "mildly gifted"), potentially further excelling in code generation and mathematical reasoning.

Claude-5 (Expected July 2026-March 2028): Released after 16-32 months, answering approximately 31 questions correctly, achieving an IQ of approximately 140 (approaching top human intelligence), suitable for complex strategic planning and cross-disciplinary tasks.

Claude-6 (Expected March 2028-March 2033): Released after 20-64 months, answering all 35 questions correctly, exceeding the IQ of almost all humans, potentially demonstrating superhuman-level general intelligence.

AIbase emphasizes that these predictions are based on simple extrapolations, and actual progress may be affected by budget, energy, regulatory, or technological bottlenecks. For example, the energy consumption and data requirements for training ultra-large-scale models may become limiting factors.

Application Prospects: From Tool to Partner

Claude-3's IQ breakthrough opens up new possibilities for AI applications. AIbase analyzes potential scenarios including:

Professional Assistance: In legal, medical, and research fields, Claude-3 can provide high-precision analysis and decision support, reducing the workload of human experts.

Educational Innovation: Through personalized teaching and complex problem-solving, AI can provide students with customized learning experiences.

Creative Industries: Combining multimodal capabilities (text and image processing), Claude-3 can assist in content creation, such as generating scripts or designing concepts.

Enterprise Automation: In data analysis, process optimization, and customer service, Claude-3's efficient reasoning capabilities can improve operational efficiency.

Community tests show Claude-3 demonstrated near-perfect recall (99%) in a "needle in a haystack" test, even identifying limitations in the test design, suggesting a degree of metacognition. AIbase believes this ensures its reliability in complex tasks.

Challenges and Reflections: Limitations of IQ Tests

While Claude-3's IQ breakthrough is exciting, AIbase cautions that IQ tests are not the sole measure of AI intelligence:

Test Limitations: IQ tests focus on logic and pattern recognition, excluding creativity, emotional intelligence, or long-term planning—key dimensions of human intelligence.

Data Contamination Risk: If test questions appear in the training data, the model might score through memorization rather than reasoning, requiring validation through original questions.

Ethical Considerations: As AI intelligence approaches or surpasses human levels, safety, transparency, and value alignment become urgent issues, and Anthropic's Constitutional AI framework may provide guidance.

The community recommends developing a more comprehensive AI evaluation system, incorporating multimodal tasks and dynamic interaction tests to more accurately measure AI's general intelligence level.

Future Outlook: Accelerated Evolution of AI Intelligence

Claude-3's success instills confidence in the AI industry but also prompts deep reflection on the future. AIbase predicts Anthropic may continue iterating models at an 8-16-month cycle, combining Moore's Law hardware advancements with algorithm optimizations, potentially accelerating AI IQ growth. However, regulatory pressure, energy costs, and ethical controversies may slow this progress. The community anticipates Claude-4 will bring more surprises in 2025, such as stronger multimodal capabilities or lower inference costs. AIbase believes Anthropic's open-source spirit and safety-first approach will promote the healthy development of the AI ecosystem.

Anthropic Releases Best Practices Guide for Claude Code, Seamlessly Integrating AI into Developer Workflows

Anthropic recently released a comprehensive best practices guide for Claude Code, providing developers with a low-level, command-line interface (CLI)-centric tool to seamlessly integrate the Claude large language model into their daily programming tasks. Based on Anthropic's internal best practices, this guide emphasizes flexible, secure, and efficient coding patterns, offering valuable guidance for engineers looking to incorporate AI into their existing development environments.

OpenAI's New o3 AI Model Shows Increased Hallucination, Raising Accuracy Concerns

OpenAI recently released its latest o3 and o4-mini AI models, which achieve state-of-the-art performance in many areas. However, these new models have not improved upon the issue of 'hallucinations,' exhibiting even more severe instances than previous OpenAI models. 'Hallucinations,' the generation of factually incorrect information by AI models, remain one of the most challenging problems in AI today. Previous generations of models showed improvements in reducing hallucinations; however, o3 and o4-mini have not.

Unveiling Claude's Values: 700,000 Conversations Reveal its Ethical Framework

Anthropic, an AI company, recently published a significant study analyzing the values expressed by its AI assistant, Claude, in real-world conversations. By deeply analyzing 700,000 anonymized conversations, the research team revealed 3,307 unique values demonstrated by Claude across various contexts, offering new insights into AI alignment and safety. This research aimed to assess whether Claude's behavior aligns with its design goals. The research team developed a novel evaluation method...

Swiss Researchers Claim AI Can Identify Hidden Locations of Potentially Habitable Planets

The search for another Earth-like planet in the vast universe has been akin to searching for a needle in a haystack. However, a research team from Switzerland has injected powerful new momentum into this epic exploration. They have developed an AI model that acts like a sharp-eyed interstellar detective, able to penetrate the dust and identify unknown corners that may harbor habitable worlds. This is not merely a technological breakthrough, but also a roadmap to the future. In a recent study published in Astronomy & Astrophysics, the scientists detail...

iFlytek's StarFire X1 Receives Major Upgrade: Aims to Rival OpenAI in AI

On April 21st, iFlytek officially announced a significant upgrade to its AI model, StarFire X1, aiming to compete with OpenAI's models in intelligent reasoning and multi-tasking capabilities. This domestically-trained large language model excels in various general tasks, including mathematics, programming, logical reasoning, text generation, language understanding, and knowledge question answering. This upgrade incorporates data from more complex scenarios, significantly improving the model's performance.

xAI Releases Grok3Mini: A Cost-Effective AI Model for Developers

xAI recently unveiled its new language model, Grok3Mini, further advancing efficient AI technology. Designed for speed and affordability, Grok3Mini, despite its smaller size, outperforms many more expensive AI models across various domains, particularly excelling in math, coding, and scientific benchmarks. Grok3Mini: The perfect balance of high performance and low cost. Grok3Mini is part of the Grok3 series, which includes six variants, including the standard Grok3.

Intel Open-Sources AI Playground: Arc GPU-Powered Local AI Model Execution

Intel recently announced the open-sourcing of its AI Playground software, designed for local generative AI. AI Playground provides a powerful platform for running AI models on Intel Arc GPUs. It supports various image and video generation models, as well as Large Language Models (LLMs), significantly lowering the hardware barrier for AI applications by optimizing local computing resources. The project is available on GitHub and has attracted developers and AI enthusiasts worldwide.

OpenAI's o3 Model Test Scores Questioned; Actual Performance Falls Far Short of Claims

OpenAI's recently released o3 AI model has sparked controversy over its benchmark test performance. While OpenAI confidently claimed in December that the model could correctly answer over a quarter of the highly challenging FrontierMath math problems, this assertion starkly contrasts with recent independent test results. The Epoch Institute's independent testing revealed the model achieved only a 10% success rate, significantly lower than advertised.