Recently, a research team from Princeton University released an intriguing study report, indicating that by August 2024, approximately 4.36% of new articles on Wikipedia will contain significant AI-generated content. This research was jointly conducted by scholars Creston Brooks, Samuel Eggert, and Denis Peskoff, who utilized tools named GPTZero and Binoculars to detect these AI-generated contents.
The study shows that compared to data before the release of GPT-3.5, the AI-generated content in Wikipedia articles in 2024 has significantly increased. Among the 2,909 English Wikipedia articles examined, GPTZero flagged 156, Binoculars flagged 96, and there were 45 overlapping articles between the two tools. The flagged articles often had lower quality, fewer citations, and did not integrate well into Wikipedia's knowledge network. Some articles even appeared to be self-promotional, involving personal or commercial promotion, often with superficial citations, such as personal YouTube videos.
In terms of political content, eight articles significantly pushed specific viewpoints, involving some controversial topics, such as the editing wars about Albanian history. Additionally, some users also utilized large language models (LLMs) to write content on niche topics, including fungi, cuisine, sports, and even chapter-by-chapter book summaries.
The study also compared AI-generated content on Wikipedia with that on Reddit and UN press releases, finding that AI-generated content on Reddit was much lower, accounting for less than 1%. This indicates that AI-generated content on Reddit is either very scarce, subject to censorship, or difficult to detect. Meanwhile, AI-generated UN press releases have significantly increased, soaring from less than 1% before 2022 to 20% in 2024.
The report concludes by emphasizing that with the rise of generative LLMs, AI detection tools are also continuously evolving. However, assessing these detectors in different contexts such as text length, field, and human-machine integration still faces challenges. To address the challenges posed by AI-generated content, individuals, educational institutions, businesses, and governments need to actively seek reliable methods to verify human-created content. Regulatory agencies in various countries should also strengthen the management of AI-generated content. For instance, China has already started taking measures to increase the transparency of AI-generated information on the internet, issuing relevant draft regulations. India has also issued recommendations this year for the labeling of AI-related content, although this proposal has sparked widespread controversy and criticism.
Key Points:
📊 The study shows that about 4.36% of new Wikipedia articles are AI-generated.
🔍 AI-generated content on Reddit is less than 1%, showing a significant difference.
🌐 Countries are exploring regulatory measures and labeling requirements for AI-generated content.