"Baimao Battle" Family's First, When Will Cheating in Large Model 'Scoring' Stop?

罗超频道

Published inAI News · 2 min read · Nov 29, 2023

103

The article examines the phenomenon of "benchmarking chaos" in the current evaluation systems for large models, noting that there is a widespread occurrence of "everyone being number one" in the rankings. The existing open-source benchmarking datasets can lead to a "problem-solving" mentality, while closed proprietary datasets can affect fairness. Additionally, some rankings lack scientific and comprehensive evaluation dimensions. The article suggests establishing an authoritative evaluation system, open-sourcing the evaluation tools and processes to ensure fairness, but adopting a model of open historical + closed formal datasets for evaluation. Moreover, the commercialization of large models is far more important than the parameters of the models or their rankings on the leaderboards.

Large Model Evaluation Large Model Scoring Large Model Application

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Visual China has reached cooperation with multiple large model companies; OpenAI urgently suspended Sora from generating deceased celebrities; Google launches Gemini map data integration tool

Visual China partners with AI firms to develop a commercial visual model, securing orders from Alibaba and Microsoft. It targets the creative industry with traceability features, helping businesses understand AI trends.....

Oct 20, 2025

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Visual China disclosed the progress of its AI business during an online meeting, stating that it has collaborated with multiple AIGC companies to develop a "commercially available and traceable" visual creative large model, and has received compliant data service orders from Alibaba, Microsoft, and others. The company positions itself as providing high-quality, copyright-compliant data resources for AI model training, and possesses over 700 million content data entries for visual training.

Oct 20, 2025

Visual China owns 700 million compliant data and has received model training orders from leading AI companies such as Alibaba and Microsoft

Visual China has cooperated with several leading AIGC enterprises to build a large visual creation model that is commercializable and traceable, promoting the healthy and high-quality application of AI in the creative field and copyright compliance. With its global rich resources, the company's data service business has attracted major model companies at home and abroad, such as Alibaba and Microsoft, providing compliant data and demonstrating strong market appeal.

Oct 20, 2025

120

515 million! China's generative AI users doubled in half a year, with over 90% of users favoring domestic large models

The 6th China Internet Basic Resources Conference released the "Report on the Development of Generative Artificial Intelligence Applications (2025)". As of June 2025, the number of generative AI users in China reached 515 million, with a penetration rate of 36.5%. The user base increased by 266 million in the first half of the year, growing by 106.6%, doubling in size within six months, demonstrating strong development momentum.

Oct 20, 2025

150

Tmall's Double 11: AI Applications Make Their Mark, Exploding in Usage on the First Day with 15 Billion Requests

Tmall launched six AI shopping applications for Double 11, marking the comprehensive implementation of AI technology. On the first day, the usage of large-scale models reached 15 billion times, with a 40-fold improvement in computing power, significantly enhancing user experience and demonstrating the rapid development of the e-commerce industry.

Oct 20, 2025

110

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

Beijing Tsinghua Changgeng Hospital has collaborated with Beijing Electronic Information and Intelligence to develop China's first pharmaceutical-specific large model, using AI to optimize pharmaceutical processes, improve the efficiency and accuracy of medication safety evaluation for special populations such as the elderly, children, and pregnant women, and address the challenges of rapid updates in drug information and complex individual differences.

Oct 17, 2025

150

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

Google Gemini 3.0 Pro begins limited rollout, enhancing reasoning and multimodal capabilities, with full release expected by month-end. DeepMind team is gradually updating users to boost AI performance.....

Oct 17, 2025

260

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

ByteDance launches Doubao 1.6, the first domestic model with adjustable thinking depth, balancing efficiency and quality, plus a lightweight version for enterprises.....

Oct 17, 2025

120

Baidu Releases Global Leading Document Parsing Model PaddleOCR-VL, Reshaping the OCR Technology Landscape!

Baidu's open-source PaddleOCR-VL model, with 0.9B parameters, leads globally with 92.6 points on OmniBenchDoc V1.5. It excels in text, handwriting, tables, formulas, and chart recognition.....

Oct 17, 2025

230

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

Microsoft launches OpenAI's Sora2 video generation model on Azure AI for public preview, offering cloud API access to businesses and developers. This multimodal tool processes text, image, and video inputs to create new content, advancing generative AI video into commercial applications like advertising.....

Oct 17, 2025

210

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

"Baimao Battle" Family's First, When Will Cheating in Large Model 'Scoring' Stop?

罗超频道

This article is from AIbase Daily

AI News Recommendations

AI Daily: Visual China has reached cooperation with multiple large model companies; OpenAI urgently suspended Sora from generating deceased celebrities; Google launches Gemini map data integration tool

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Visual China owns 700 million compliant data and has received model training orders from leading AI companies such as Alibaba and Microsoft

515 million! China's generative AI users doubled in half a year, with over 90% of users favoring domestic large models

Tmall's Double 11: AI Applications Make Their Mark, Exploding in Usage on the First Day with 15 Billion Requests

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

Baidu Releases Global Leading Document Parsing Model PaddleOCR-VL, Reshaping the OCR Technology Landscape!

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

"Baimao Battle" Family's First, When Will Cheating in Large Model 'Scoring' Stop?

罗超频道

This article is from AIbase Daily

AI News Recommendations

AI Daily: Visual China has reached cooperation with multiple large model companies; OpenAI urgently suspended Sora from generating deceased celebrities; Google launches Gemini map data integration tool

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Visual China owns 700 million compliant data and has received model training orders from leading AI companies such as Alibaba and Microsoft

515 million! China's generative AI users doubled in half a year, with over 90% of users favoring domestic large models

Tmall's Double 11: AI Applications Make Their Mark, Exploding in Usage on the First Day with 15 Billion Requests

Tsinghua Changgeng Hospital Collaborates with Beijing Electronic Information and Intelligence to Develop China's First Pharmaceutical Large Model: Focused on Medication Safety Evaluation for Special Populations

AI Daily: Google Gemini 3.0 Pro is being rolled out on a limited scale; Aishike Technology completes B+ round financing of 100 million yuan; Baidu releases document parsing model PaddleOCR-VL

AI Daily: ByteDance Launches DouBao Large Model 1.6; AiShi Technology Completes 100 Million RMB B+ Funding Round; Baidu Releases Document Parsing Model PaddleOCR-VL

Baidu Releases Global Leading Document Parsing Model PaddleOCR-VL, Reshaping the OCR Technology Landscape!

OpenAI Video Generation Model Sora 2 Launches on Microsoft Azure Platform: Pricing at $0.10 per Second, Enters Public Preview Phase

GEO Services