Recent research led by the Department of Cognitive Science at the University of California, San Diego, marks a milestone in artificial intelligence: OpenAI's latest model, GPT-4.5, has achieved a superhuman performance in the standard Turing test using "persona-based" interaction, becoming the most human-like AI conversational system to date. This achievement not only reshapes our understanding of AI's language capabilities but also opens up new possibilities for AI applications in social intelligence.

The experiment compared four representative AI systems: the classic 1960s chatbot ELIZA, Meta AI's LLaMa-3.1-405B, and OpenAI's GPT-4o and GPT-4.5. Researchers designed two independent tests, each with 250 participants from online platforms like Prolific, totaling 500 participants. Participants represented diverse age groups, genders, and educational backgrounds, ensuring sample diversity. The classic three-party Turing test format was used: each judge interacted via text interface with two conversational entities (one human, one AI) for 5 minutes, then judged which was human.

image.png

The results were striking. GPT-4.5 achieved a 73% "human-passing" rate in both tests, exceeding human success rates (typically 60%-70%), becoming the first AI model to truly "pass" a standard Turing test. In comparison, GPT-4o had a slightly lower passing rate, LLaMa-3.1-405B approached or even reached human-level performance in some settings, while ELIZA lagged significantly. Researchers noted GPT-4.5's impressive natural language fluency and emotional richness, adapting responses to the judge's tone, often described as "friendly" or "authentic" by participants.

image.png

More noteworthy is GPT-4.5's demonstration of "human-like social intelligence." The research team suggests the model quickly grasps emotional cues in short conversations and responds in ways that align with human social expectations, even surpassing human performance in some contexts. For instance, when judges showed confusion or emotional distress, GPT-4.5 provided comforting or humorous responses. This nuanced interaction fooled many participants into believing they were conversing with a real person.

image.png

In contrast, LLaMa-3.1-405B, while technically impressive, showed slightly weaker emotional expression and contextual adaptability. However, its near-human performance in specific settings highlights the potential of open-source models in the AI race. GPT-4o, the predecessor to GPT-4.5, demonstrated considerable capabilities but lagged behind in personalized expression and dynamic adjustments.

image.png

Industry experts attribute GPT-4.5's success to its training incorporating more complex persona-based mechanisms and conversational strategies. Unlike the "improvisational generation" of traditional language models, GPT-4.5 seems to create a "predictive framework" before a conversation and dynamically optimizes responses based on real-time feedback. This makes it exceptionally "clever" in short exchanges, masking its inherent mechanical nature. However, this raises questions about whether the Turing test remains the ultimate measure of AI intelligence. Some scholars argue GPT-4.5's success relies more on mimicking human social behavior than true understanding or autonomous thought.

Regardless, GPT-4.5's breakthrough revitalizes AI development. Its human-like conversational abilities could lead to more practical applications, from educational tutoring and psychological support to customer service. Its high passing rate also reminds us that as AI becomes more human-like, discerning reality from simulation and regulating its use will be crucial societal challenges.

This research release coincides with rapid AI iteration. GPT-4.5's emergence is not just a technical victory for OpenAI but also a profound questioning of the human-machine relationship. As one participant remarked, "It felt like I was chatting with a friend—until I realized it was all code magic." In this ongoing dialogue between humans and AI, the real test may have just begun.

Paper Link: https://arxiv.org/pdf/2503.23674