In 1950, a clever man named Alan Turing devised a clever method to test whether machines possessed intelligence, known as the famous Turing Test. Simply put, if a machine can engage in text-based communication without being distinguishable from a human, it passes the test and is considered intelligent.

However, as technology has advanced, we've begun to ponder a new question: If we are not directly interacting with AI but instead reading text records of AI interacting with others, can we still accurately determine who is human and who is a machine?

Recently, a group of scientists at the University of California, San Diego, delved deeply into this issue. They designed modified versions of the Turing Test, called the "Inverted Turing Test" and the "Shifted Turing Test," to explore this question.

AI Robot Interview, Negotiation

Image source note: The image was generated by AI, authorized service provider Midjourney

In the Inverted Turing Test, the AI is no longer the subject of the test but becomes the judge. Scientists had GPT-3.5 and GPT-4, two large language models, read records of real human-AI dialogues and then determined whether the participants were human or AI.

The results were surprising: these AI judges not only had a lower accuracy rate than human judges directly involved in the communication, but in many cases, they even mistakenly identified AI as human. Notably, for the best-performing GPT-4 model, the AI judges identified it as human more frequently than actual human participants.

image.png

Scientists also conducted the Shifted Turing Test, where human judges read records of AI and human dialogues. It was found that even human judges had a lower accuracy rate in this scenario than human judges directly involved in the communication.

These findings tell us that both humans and AI struggle to accurately determine whether the other party is human or a machine without direct interaction. This has significant implications for our online communications in daily life, as we often understand others through reading their conversations.

image.png

This also means that if we rely on AI to detect fake information or AI impersonating humans online, we may need more precise tools. Because current AI models do not perform better than humans in this task.

This research not only deepens our understanding of AI but also reveals an important challenge in AI development: how to design better tools to detect and distinguish between content generated by AI and content generated by humans.

As AI technology continues to advance, this issue will become increasingly important. We need to ensure that while we enjoy the convenience brought by AI, we can also protect our data security and the authenticity of our online environment.

Paper link: https://arxiv.org/pdf/2407.08853