First AI Gaokao Assessment Results Released: GPT-4o Takes Second Place

AIbase

Published inAI News · 4 min read · Jun 20, 2024

216

In the realm of artificial intelligence, the college entrance examination is no longer just a stage for humans. Recently, the Shanghai Artificial Intelligence Laboratory has given us a glimpse of AI's academic prowess with a unique "college entrance exam." They employed the OpenCompass evaluation system, which subjected seven AI models, including GPT-4o, to comprehensive tests in Chinese, Mathematics, and English.

2_1718848649312_ai2023_A_large_classroom_filled_with_rows_of_robots_sitting_at__db532bea-895e-4609-b80c-5fedf4ecf846.png

Image source note: The image was generated by AI, provided by the image authorization service Midjourney

This test used the National New Curriculum Standard I paper, ensuring that all participating open-source models were already open-sourced before the college entrance examination, maintaining the fairness of the test. Moreover, the AI "answer sheets" were manually judged by teachers with experience in grading the college entrance examination, striving to meet real marking standards.

The models involved in the evaluation came from diverse backgrounds, including the Mixtral8x22B dialogue model from French AI startup Mistral, Yi-1.5-34B from Lingyi Wuren Company, GLM-4-9B from Zhipu AI, InternLM2-20B-WQX from the Shanghai Artificial Intelligence Laboratory, and the Qwen2 series from Alibaba. GPT-4o, as a closed-source model, participated in the evaluation only as a reference.

The results were announced, with Qwen2-72B leading the pack with a total score of 303, followed closely by GPT-4o with 296 points, and InternLM2-20B-WQX in third place with 295.5 points. These models performed well in Chinese and English, with an average score rate of 67% in Chinese and an impressive 81% in English. However, in Mathematics, the average score rate for all models was only 36%, indicating significant room for improvement in mathematical reasoning for AI.

The marking teachers conducted a comprehensive analysis of the AI models' answer sheets. In the Chinese subject, the models generally handled modern text comprehension well but showed some deficiencies in classical Chinese and essay writing. In Mathematics, although the models had strong formula memorization skills, they lacked flexibility in applying them during problem-solving. The English subject overall performed well, but some models had lower score rates in certain question types.

This "large-scale AI college entrance exam" not only showcases the potential of AI in the academic field but also reveals their limitations in understanding and applying knowledge. With continuous technological advancements, we have reason to believe that future AI will become smarter and better serve human society.

Swiss Company Mimic Robotics Secures $16 Million in Funding to Advance AI-Driven Dexterous Robots

Swiss company Mimic Robotics has secured $16 million in funding, led by Elaia, bringing the total funding to over $20 million. The funds will be used to accelerate the development of cutting-edge AI technologies and humanoid robot hands for manufacturing and logistics applications, aiming to perform complex dexterous tasks and strengthen Europe's position as a leader in general robotics.

Meituan's All-Round Cat Makes a Grand Debut! LongCat-Flash-Omni Multimodal Large Model Opens Source and Tops the Charts Immediately, with Real-Time Interaction That Is Extraordinarily Fast

Meituan's open-source multimodal large model, LongCat-Flash-Omni, achieves a technological breakthrough, surpassing closed-source competitors in multiple benchmark tests, reaching industry-leading levels. The model supports real-time integration processing of text, speech, images, and video, with near-zero latency in interaction, pushing locally developed multimodal AI applications to a new level.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

First AI Gaokao Assessment Results Released: GPT-4o Takes Second Place

AIbase

This article is from AIbase Daily

AI News Recommendations

Anthropic Launches a New Code Execution Model Based on MCP to Improve AI Agent Efficiency

AI Daily: Sora Launches on Android; NetEase Music Introduces AI Equalization Master; Google to Launch Nano Banana2

Monetization Ideas Anyone Can Learn! B Station Uploader Uses AI to Create Character MVs from Journey to the West, All AI-Generated

Swiss Company Mimic Robotics Secures $16 Million in Funding to Advance AI-Driven Dexterous Robots

Beijing Promotes the Deep Integration of AI and Healthcare, Establishes Physical Hospitals to Accelerate Industrial Development

Musk's Secret Lab: Collecting Human Behavior Data to Train Robots

4 Months After Launch, Monthly Active Users Exceed 10 Million, AQ Becomes the Top Professional AI Application in China

New Shopping Assistant for Double 11 Launches! Fliggy AI Helps You Accurately Select Products and Book Easily

NetEase Cloud Music Officially Launches AI Audio Tuning Master Large Model

Meituan's All-Round Cat Makes a Grand Debut! LongCat-Flash-Omni Multimodal Large Model Opens Source and Tops the Charts Immediately, with Real-Time Interaction That Is Extraordinarily Fast

GEO Services