AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

Doubao Team Open-Sources SuperGPQA: Challenging the Limits of AI Reasoning Across 285 Disciplines

AIbase基地

Published inAI News · 3 min read · Mar 4, 2025

Recently, the Doubao large model team at ByteDance, in collaboration with the M-A-P open-source community, released SuperGPQA, a knowledge reasoning benchmark covering 285 postgraduate disciplines and containing 26,529 professional questions.

This dataset not only encompasses mainstream disciplines like mathematics and physics but also, for the first time, incorporates long-tail disciplines such as light industry, agriculture, and service science into the evaluation system, filling a gap in existing benchmarks. SuperGPQA has been used to reveal the performance gap between open-source and closed-source models, becoming a crucial tool for AI development.

Traditional benchmarks like MMLU and GPQA cover fewer than 50 disciplines, with long-tail disciplines accounting for less than 5%. Furthermore, due to their single data source (e.g., Wikipedia) and unreliable crowdsourced annotations, they struggle to measure model reasoning capabilities in complex scenarios. SuperGPQA, built over six months using an expert-LLM collaborative mechanism, selects questions from authoritative sources. Its questions offer an average of 9.67 options, with 42.33% requiring mathematical calculations or formal reasoning, demonstrating both breadth and depth. Experiments show that the optimal model, DeepSeek-R1, achieves an accuracy of only 61.82%, indicating that current large language models still have room for improvement in diverse knowledge domains.

SuperGPQA employs a three-stage process to enhance quality: expert screening of initial questions, standardized transcription, and multi-layered quality checks (rule filtering, LLM detection, and expert review). Evaluation results show that instruction fine-tuning significantly improves performance, with DeepSeek-V3 outperforming its basic version. However, open-source models still lag behind closed-source solutions on difficult questions.

Paper Link:https://arxiv.org/pdf/2502.14739

Data Link:https://huggingface.co/datasets/m-a-p/SuperGPQA

Code Link:https://github.com/SuperGPQA/SuperGPQA

SuperGPQA Soybean Large Model Knowledge Reasoning Research-level Science

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Ant Launches Self-Developed Knowledge-Enhanced Large Model Service Framework KAG: Improving Knowledge Reasoning Accuracy

At the 2024 Inclusion · Bund Conference, Ant Group shared its latest progress in building knowledge-enhanced professional intelligent agents and launched the research and development achievement of combining knowledge graphs with large models—the Knowledge-Enhanced Large Model Service Framework KAG. This framework, introduced by Liang Lei, head of the knowledge graph at Ant Group, aims to guide decision-making and retrieval through graph logical symbols, significantly improving the precision and logical rigor of decisions in vertical domains.

Sep 13, 2024

3.3k

C-Eval Assesses Advanced Knowledge and Reasoning Abilities of Chinese Foundation Models

C-Eval evaluates the advanced knowledge and reasoning abilities of Chinese foundation models, including multiple-choice questions across four difficulty levels, covering 52 different subject areas. The test cases in C-Eval are sourced from freely available mock exams on the internet, including university-level questions and mock tests for graduate entrance examinations. The C-Eval leaderboard showcases the performance of open-source models in C-Eval assessments. This evaluation benchmark aids in selecting large models suitable for the field of natural language processing, promoting the development of AI applications.

Oct 8, 2023

1.2k