Recently, researchers from institutions such as Stanford University fed thousands of top-tier conference articles, such as Nature and ICLR, into GPT-4 to generate review comments, which were then compared with those of human reviewers. The results showed that GPT-4's review comments were consistent with those of humans more than 50% of the time and were found to be helpful for 82% of the authors. The study also revealed that, unlike human reviewers, GPT-4 paid more attention to the impact factor of the papers rather than details like supplementary ablation experiments. Users generally believe that the review feedback generated by GPT-4 can improve the accuracy of reviews and reduce labor costs. This research indicates that using large language models (LLMs) to assist in academic paper reviews is feasible.
GPT-4 Becomes a Reviewer for Nature? Stanford and Tsinghua Alumni Test Paper Shows Over Half of the Review Comments Match Human Reviews

新智元
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.