Recently, the Doubao large model team at ByteDance, in collaboration with the M-A-P open-source community, released SuperGPQA, a knowledge reasoning benchmark covering 285 postgraduate disciplines and containing 26,529 professional questions.

QQ20250304-140137.pngQQ20250304-140137.png

This dataset not only encompasses mainstream disciplines like mathematics and physics but also, for the first time, incorporates long-tail disciplines such as light industry, agriculture, and service science into the evaluation system, filling a gap in existing benchmarks. SuperGPQA has been used to reveal the performance gap between open-source and closed-source models, becoming a crucial tool for AI development.

Traditional benchmarks like MMLU and GPQA cover fewer than 50 disciplines, with long-tail disciplines accounting for less than 5%. Furthermore, due to their single data source (e.g., Wikipedia) and unreliable crowdsourced annotations, they struggle to measure model reasoning capabilities in complex scenarios. SuperGPQA, built over six months using an expert-LLM collaborative mechanism, selects questions from authoritative sources. Its questions offer an average of 9.67 options, with 42.33% requiring mathematical calculations or formal reasoning, demonstrating both breadth and depth. Experiments show that the optimal model, DeepSeek-R1, achieves an accuracy of only 61.82%, indicating that current large language models still have room for improvement in diverse knowledge domains.

QQ20250304-140147.png

SuperGPQA employs a three-stage process to enhance quality: expert screening of initial questions, standardized transcription, and multi-layered quality checks (rule filtering, LLM detection, and expert review). Evaluation results show that instruction fine-tuning significantly improves performance, with DeepSeek-V3 outperforming its basic version. However, open-source models still lag behind closed-source solutions on difficult questions.

Paper Link:https://arxiv.org/pdf/2502.14739

Data Link:https://huggingface.co/datasets/m-a-p/SuperGPQA

Code Link:https://github.com/SuperGPQA/SuperGPQA