AI Ranking

AI Ranking

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

AI News
AI Tools
AI Cases
AI Tutorial

Type :

AI News
AI Tools
AI Cases
AI Tutorial

2025-01-10 15:49:29.AIbase

The Glorious GLM-4-9B Model Achieves Only 1.3% Hallucination Rate, Winning First Place in Global Large Model Evaluation

The Glorious GLM-4-9B Model Achieves Only 1.3% Hallucination Rate, Winning First Place in Global Large Model Evaluation

2024-08-13 08:11:01.AIbase

The Compass Arena, a Large Model Evaluation Platform, Adds a Multi-Modal Large Model Competition Section

The Sinan OpenCompass team at the Shanghai Artificial Intelligence Laboratory has collaborated with the Modao ModelScope to launch the Compass Multi-Modal Arena, a new section of a large model evaluation platform focusing on multi-modal large models. Users can upload images and input questions to have two anonymous multi-modal large models generate answers, which can then be subjectively evaluated based on the quality of the generated content, allowing users to select the better-performing model. The platform offers an easy-to-use interface and a unique question bank.

The Compass Arena, a Large Model Evaluation Platform, Adds a Multi-Modal Large Model Competition Section

2023-11-29 09:08:23.AIbase

"Baimao Battle" Family's First, When Will Cheating in Large Model 'Scoring' Stop?

["📊 Evaluation System of Large Models: The current evaluation system for large models has issues such as open-source datasets that can be manipulated, fairness problems arising from closed evaluation datasets, and evaluation metrics that are not sufficiently scientific and comprehensive.", "💡 Trend of Large Model Applications: The article mentions that large models have evolved from model-level development to innovation at the application level.", "🔎 Commercialization Issues of Large Models: For large model teams, achieving commercialization is far more important than rankings and parameters." ]

2023-11-02 15:21:41.AIbase

Ant Group Releases Benchmark for Large Model Evaluation in the DevOps Field

Ant Group, in collaboration with Peking University, has released a benchmark for evaluating large language models in the DevOps field. This benchmark includes a total of 4850 multiple-choice questions across 8 categories such as planning, coding, building, testing, and releasing. The benchmark also provides detailed evaluations for AIOps tasks, showing that the score differences among various models are minimal.

2023-09-25 09:54:21.AIbase

Investigation into the Chaos of Large Model Evaluation: Parameter Scale Does Not Represent Everything

Parameter scale is not the only criterion for assessing large models. Differences in evaluation sets can lead to significant ranking variations. An increase in subjective question proportions can also affect rankings, raising questions about evaluation fairness. Third-party assessment organizations such as OpenCompass and FlagEval are gaining attention. The academic community believes that model robustness, safety, and other dimensions should also be considered. A truly comprehensive and effective evaluation method is still being explored.

2023-08-29 10:09:08.AIbase

August Rankings! SuperCLUE Releases Latest Rankings for Chinese Large Model Evaluation Benchmark

SuperCLUE has released the August rankings for Chinese large models, featuring 5 different ranking evaluations that selected 16 general large language models, utilizing 3,337 new test questions. The performance gap between domestic large models on Chinese tasks and GPT-3.5 is narrowing.