An analysis of scientific papers over the past decade shows that researchers have found that some "style" words are being misused by artificial intelligence models, and these words were rarely used a few years ago.
In a new study that has not yet been peer-reviewed, researchers have adopted a novel method, similar to epidemiology, to analyze "excessive word usage" in biomedical papers, revealing that large language models often misuse certain words. The research findings provide interesting insights into the impact of artificial intelligence in academia, indicating that at least 10% of abstracts were processed using large language models in 2024.
Image Source Note: Image generated by AI, licensed by Midjourney
This study involved a broad analysis of 14 million biomedical abstracts published on PubMed between 2010 and 2024. Researchers compared papers published before 2023 with those published during the widespread commercialization of large language models like ChatGPT. They found that some words previously considered "uncommon," such as "deep," are now used 25 times more frequently than before, while other words like "demonstrate" and "emphasize" have seen similar increases. However, some "common" words have also increased: the frequency of words like "potential," "find," and "key" increased by up to 4%.
Researchers point out that this significant increase is unprecedented without some urgent global events to explain it. They found that between 2013 and 2023, excessive words included nouns closely related to real events such as "Ebola," "coronavirus," and "lockdown." However, in 2024, almost all the excessive words were "style" words. In terms of quantity, out of the 280 excessive "style" words in 2024, two-thirds were verbs, and about one-fifth were adjectives.
Based on these excessive style words as "markers" of ChatGPT usage, researchers estimate that about 15% of papers published in non-English countries like China, Korea, and Taiwan are now processed by artificial intelligence, while in English-speaking countries like the UK, this proportion is 3%. Therefore, large language models may be an effective tool for non-native speakers to achieve success in an English-dominated field.
Key Points:
🔍 Researchers have found that artificial intelligence models are misusing certain "style" words in biomedical papers, which were rarely used a few years ago.
🔍 The widespread commercialization of large language models has led to a significant increase in the frequency of some words, indicating that the impact of artificial intelligence on academia may be unprecedented.
🔍 In papers published in non-English countries, the proportion of papers processed by artificial intelligence is as high as about 15%, showing that large language models may be an effective tool for non-native speakers to achieve success in an English-dominated field.