Zhipu AI Releases CritiqueLLM Scoring Model to Evaluate Text Generation Model Performance
站长之家
228
Recently, Zhipu AI introduced CritiqueLLM, a high-quality, low-cost scoring model designed to assess the performance of text generation models. Traditional evaluation metrics like BLEU and ROUGE primarily calculate scores based on n-gram overlap, lacking a comprehensive grasp of overall semantics. Model-based evaluation methods, on the other hand, heavily rely on the selection of the base model, with only top-tier large models achieving satisfactory results. To address these issues, CritiqueLLM proposes an interpretable and scalable model for text quality evaluation. It can generate high-quality scores and evaluation explanations for various tasks. In scenarios with reference texts, CritiqueLLM compares the text generated by large models with the reference text and provides scores. Across eight common tasks, CritiqueLLM's evaluation scores showed a significantly higher correlation with human ratings than other models, especially in settings without reference texts, where CritiqueLLM outperformed GPT-4 in three tasks, achieving optimal evaluation performance. The method of CritiqueLLM includes four main steps: user query augmentation, collection of evaluation data with reference texts, rewriting of evaluation data without reference texts, and training the CritiqueLLM model. Through these steps, two types of CritiqueLLM models applicable to both settings with and without reference texts are obtained, used to evaluate the performance of text generation models.
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/4096