Zhipu AI Releases CritiqueLLM Scoring Model to Evaluate Text Generation Model Performance

站长之家

Published inAI News · 3 min read · Dec 12, 2023

297

Recently, Zhipu AI introduced CritiqueLLM, a high-quality, low-cost scoring model designed to assess the performance of text generation models. Traditional evaluation metrics like BLEU and ROUGE primarily calculate scores based on n-gram overlap, lacking a comprehensive grasp of overall semantics. Model-based evaluation methods, on the other hand, heavily rely on the selection of the base model, with only top-tier large models achieving satisfactory results. To address these issues, CritiqueLLM proposes an interpretable and scalable model for text quality evaluation. It can generate high-quality scores and evaluation explanations for various tasks. In scenarios with reference texts, CritiqueLLM compares the text generated by large models with the reference text and provides scores. Across eight common tasks, CritiqueLLM's evaluation scores showed a significantly higher correlation with human ratings than other models, especially in settings without reference texts, where CritiqueLLM outperformed GPT-4 in three tasks, achieving optimal evaluation performance. The method of CritiqueLLM includes four main steps: user query augmentation, collection of evaluation data with reference texts, rewriting of evaluation data without reference texts, and training the CritiqueLLM model. Through these steps, two types of CritiqueLLM models applicable to both settings with and without reference texts are obtained, used to evaluate the performance of text generation models.

AI Scoring Model Text Generation Model CritiqueLLM

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AliTongyi Launches Z-Image Model, Downloads Exceed 500,000 on the First Day

Alibaba's Tongyi Z-Image, a 600M-parameter image generation model, topped Hugging Face with 500k downloads. It matches larger models in detail and composition. Z-Image-Turbo produces high-quality images in just 8 steps.....

Nov 28, 2025

Billions in Debt Outsourcing: OpenAI Uses Others' Balance Sheets to Rapidly Build AI Foundations

OpenAI's partners, including SoftBank and Oracle, have accumulated nearly $100 billion in debt for data centers, while OpenAI bears no financial risk.....

Nov 28, 2025

Li Bin: NIO Will Stick to Selling Cars and Not Get Involved in AI Robots for Now

NIO's Q3 2025 financial report shows deliveries of 87,071 units, an increase of 40.8% year-over-year and 20.8% quarter-over-quarter, with revenue of 21.79 billion yuan, both setting new records. CEO Li Bin emphasized focusing on the core automotive business, temporarily not entering the fields of AI and robotics, but paying attention to self-developed chips empowering third parties. Facing the Chinese market with annual sales of 30 million vehicles, NIO's market share is only slightly over 1%, and it needs to focus on improving products and sales, aiming for a breakthrough in 2026.

Nov 28, 2025

AI Daily: Zhipu Qingying 2.0 Released; Shenzhen Launches the Nation's First AI Smart Labor Arbitration System; 1688 Launches Cross-border AI Agent 'Aoshay'

Zhipu AI launches Qingying 2.0, generating 1080P videos from text with integrated CogSound for automatic audio, offering a Sora-like AI tool for developers.....

Nov 28, 2025

Skywork AI Suddenly Launches AI Poster Tool - One-Click Conversion of Entire Papers into High-End Posters!

Skywork AI's Nano Banana Pro enables automatic poster design from text input, generating professional layouts in seconds without design skills. It supports full papers, processing 30+ pages in under 20s. Now free for all users.....

Nov 28, 2025

Kunlun Group officially launches Mureka V7.6 and O2 models with higher quality and more innovation

Kunlun Group launched Mureka V7.6 and O2 models, promoting the development of AI music creation. The new models have optimized user experience and generation results, attracting global attention. Since the launch of O1 and V6 models in early March, the platform has added nearly 7 million new users, covering over 100 countries. The continuously iterated V7 series has further enhanced service capabilities.

Nov 28, 2025

Lei Jun: All industries are worth doing again with AI

Lei Jun predicts AI will transform traditional industries in five years, citing Xiaomi's car factory where AI vision models cut inspection time to 2 seconds, 10 times faster than manual checks.....

Nov 28, 2025

The World's Top AI Academic Conference Was Hacked by AI: Over 15,000 Review Comments Written by AI

ICLR 2026 Review System Suffered Large-Scale AI Infiltration: Detection Shows Among 76,000 Reviews, 21% Were Fully Generated by Large Models, 35% Were Polished by AI, and Only 43% Were Written by Humans. Machine Reviews Are Longer, Score Higher, but Often Contain Errors Such as 'Hallucinated Citations', Triggering Protests from Authors. The Organizing Committee Issued an Emergency Ban, Planning to Block AI-Generated Content at the Submission Stage to Rebuild Trust.

Nov 28, 2025

130

Lei Jun: Every Industry Is Worth Redoing with AI, Humanoid Robots Will Enter Factories in Large Scale Within Five Years

Founder of Xiaomi, Lei Jun, predicts that AI will deeply transform traditional industries in the next five years, proposing the idea that every industry is worth redoing with AI. He uses the Xiaomi car factory as an example to illustrate how AI and traditional manufacturing integration can significantly improve efficiency: using AI visual models to detect die-cast parts, reducing detection time to 2 seconds, with efficiency 10 times that of manual work, and accuracy improved by more than 5 times.

Nov 28, 2025

Meta Unveils a White-Box Scalpel: CoT-Verifier Pins AI Reasoning Errors to an Attribution Graph

Meta AI's CoT-Verifier model identifies reasoning errors by analyzing step-by-step 'circuit traces' in chain-of-thought processes. Unlike traditional output-only verification, it performs forward reasoning and extracts attribution graphs, revealing structural differences between correct and incorrect reasoning. A lightweight classifier enables efficient verification, now available on Hugging Face.....

Nov 28, 2025

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Zhipu AI Releases CritiqueLLM Scoring Model to Evaluate Text Generation Model Performance

站长之家

This article is from AIbase Daily

AI News Recommendations

AliTongyi Launches Z-Image Model, Downloads Exceed 500,000 on the First Day

Billions in Debt Outsourcing: OpenAI Uses Others' Balance Sheets to Rapidly Build AI Foundations

Li Bin: NIO Will Stick to Selling Cars and Not Get Involved in AI Robots for Now

AI Daily: Zhipu Qingying 2.0 Released; Shenzhen Launches the Nation's First AI Smart Labor Arbitration System; 1688 Launches Cross-border AI Agent 'Aoshay'

Skywork AI Suddenly Launches AI Poster Tool - One-Click Conversion of Entire Papers into High-End Posters!

Kunlun Group officially launches Mureka V7.6 and O2 models with higher quality and more innovation

Lei Jun: All industries are worth doing again with AI

The World's Top AI Academic Conference Was Hacked by AI: Over 15,000 Review Comments Written by AI

Lei Jun: Every Industry Is Worth Redoing with AI, Humanoid Robots Will Enter Factories in Large Scale Within Five Years

Meta Unveils a White-Box Scalpel: CoT-Verifier Pins AI Reasoning Errors to an Attribution Graph

GEO Services