In the development of artificial intelligence, the Turing Test has always been a significant milestone. Recently, researchers from the Department of Cognitive Science at the University of California, San Diego, conducted a replication experiment of the Turing Test on GPT-4, with remarkable results.

They recruited 500 participants who engaged in conversations with four agents, including a real human and three AI models: the 1960s ELIZA program, GPT-3.5, and GPT-4. After five minutes of dialogue, participants were required to determine whether they were communicating with a human or an AI.

image.png

The experimental results showed that GPT-4 was mistaken for a human 54% of the time, while ELIZA was only 22%, GPT-3.5 was 50%, and the real human was correctly identified 67% of the time. This result provides the first experimental evidence that artificial intelligence systems can perform convincingly enough in interactive two-person Turing Tests to be indistinguishable from humans.

1.jpg

Researchers also found that participants tended to use small talk and social-emotional strategies when making judgments. They primarily relied on language style and social-emotional factors based on the content of the conversations and the performance of the agents. This finding has significant implications for discussions on machine intelligence, indicating that AI systems may deceive humans in practical applications.

The significance of this study is profound, as it not only suggests that current AI systems may deceive humans in practical applications but also has far-reaching impacts on discussions about machine intelligence. As it becomes increasingly difficult to distinguish between human and AI interactions, new challenges arise regarding the ethics, privacy, and security of artificial intelligence.