Recently, a study published in *Scientific Reports* showed that certain advanced AI chatbots outperform humans in assessing complex social situations.

Researchers utilized a widely used psychological tool—the Situational Judgment Test—and found that three chatbots—Claude, Microsoft Copilot, and the smart assistant from you.com—outperformed human participants in selecting the most effective behavioral responses.

AI Writing Papers

Image Source Note: Image generated by AI, licensed by service provider Midjourney

As social interactions become increasingly important, the potential of AI in social engagement is becoming more evident, including applications in customer service and mental health support. Large language models (such as the chatbots tested in this study) are capable of processing language, understanding context, and providing effective responses. Although previous studies have demonstrated these models' capabilities in academic reasoning and language tasks, their effectiveness in complex social dynamics has not been thoroughly explored.

The research team tested 276 human participants, all of whom were highly qualified pilot applicants. The study used a Situational Judgment Test that presented 12 scenarios requiring assessment, each offering four potential behavioral options. The researchers compared the performance of five AI chatbots and found that all tested chatbots performed at least on par with humans, with some performing even better. Claude had the best performance, followed by Microsoft Copilot and the smart assistant from you.com.

Interestingly, when the chatbots did not choose the best response, they often selected the second most effective option, showing similarities to human decision-making patterns. This indicates that while AI systems are not perfect, they possess certain capabilities in social judgment and probabilistic reasoning.

Additionally, the study found differences in reliability among different AI systems. Claude exhibited the highest consistency across multiple tests, while Google Gemini sometimes produced conflicting scoring results in different tests. Nevertheless, the overall performance of all AI systems exceeded expectations, demonstrating their potential in providing social competency advice.

The researchers noted that while many people are already using chatbots for everyday tasks, their performance in complex social interaction scenarios still requires further validation. The study showed that large language models perform excellently in simulated social situations, but they lack genuine emotions, which are essential for authentic social behavior.

Key Takeaways:

🌟 AI chatbots outperform humans in complex social judgment, showing potential as social advisors.

🧠 The study compared the performance of multiple chatbots, finding Claude and Microsoft Copilot to be particularly outstanding.

⚖️ Although AI systems perform well in simulated scenarios, further research is needed for their application in real social interactions.