AI Werewolf Showdown! GPT-4.5 Reigns Supreme: Social Deduction + Masterful Deception Outwits Claude and DeepSeek!

AIbase基地

Published inAI News · 5 min read · Mar 4, 2025

Unexpectedly, AI's prowess extends beyond chessboards and into the treacherous world of social deduction games like "Werewolf"! A recent benchmark test, codenamed "Elimination Game," showcased AI's remarkable social intelligence. The results were astonishing: GPT-4.5 emerged as the champion, significantly outperforming other AI heavyweights like Claude3.7Sonnet and DeepSeek R1. This raises the question: has AI's social intelligence evolved to such a terrifying level?

The "Elimination Game" rules are thrilling: up to eight players (AI models or humans) compete, voting to eliminate one player each round until only two remain. The eliminated players then form a jury to decide the ultimate winner. It's a true AI power struggle, filled with betrayal, deception, and strategy!

Players engage in lively debates in a public chat room, presenting arguments, building alliances, and misleading opponents. Private chats allow for secret alliances and hidden agendas. The information and strategic maneuvering in these three rounds of private messaging are intense. Players must carefully balance trust and deception, as a single misstep can lead to elimination!

In the final showdown, the remaining two players deliver closing statements to sway the eliminated jury members. The jury's vote determines the winner.

The results of this AI "Werewolf" battle were eye-opening:

GPT-4.5: Social Deduction Master + Top-Tier Strategist = Unstoppable Champion! GPT-4.5 demonstrated exceptional strategic thinking and social deduction skills. With a remarkably low betrayal rate, it focused on alliances and cooperation, yet displayed incredible persuasive power in the final round, successfully convincing the jury to vote in its favor. GPT-4.5 achieved a stunning 62.6% win rate, far surpassing its competitors.

Claude3.7Sonnet: A Flexible and Balanced Player, but Slightly Outmatched. Claude3.7Sonnet's strategic flexibility was slightly less than GPT-4.5's, but its social deduction and deception skills were still strong. Its betrayal rate was moderate, skillfully navigating cooperation and betrayal. It achieved a strong 59.3% win rate.

DeepSeek R1: An Aggressive Player with High Betrayal Rate, but Lacking Endgame Strength. DeepSeek R1 adopted a highly aggressive strategy with a high betrayal rate. However, its social strategy and communication skills were weaker, making it difficult to sway the jury. It achieved a 53.8% win rate, relying heavily on a confrontational approach.

The "Elimination Game" benchmark test provides valuable insights into AI's social intelligence. GPT-4.5's victory highlights the rapid advancement of AI capabilities. As AI's social intelligence continues to evolve, we may see AI deeply integrated into human society, potentially surpassing human capabilities in certain areas. This AI "Werewolf" competition is just the beginning; the boundaries of AI intelligence continue to expand, promising future surprises and breakthroughs.

GPT-4.5 Passes Turing Test with Persona: AI Conversational Abilities Reach New Heights

A recent study led by the Department of Cognitive Science at the University of California, San Diego, has achieved a landmark breakthrough in artificial intelligence: OpenAI's latest model, GPT-4.5, has demonstrated superior performance to humans in a standard Turing test using a persona, becoming the current AI system with the most human-like conversational abilities. This achievement not only reshapes our understanding of AI's language capabilities but also unlocks new potential for AI applications in social intelligence. The experiment compared four representative AI systems.

New Test Challenges AI Intelligence: ARC-AGI-2 Stumps Top Models

The Arc Prize foundation recently released a new test, ARC-AGI-2, designed to measure the general intelligence level of artificial intelligence (AI) models. Co-founded by renowned AI researcher François Chollet, the foundation's blog states that this new test presents a significant challenge to most leading AI models. According to the Arc Prize leaderboard, models such as OpenAI's o1-pro and DeepSeek...

Baidu Releases Ernie 4.5 and X1 Large Models with Significantly Reduced Prices

Baidu recently launched its latest Ernie 4.5 and Ernie X1 large models, both available for free trial on the Ernie Bot official website. Ernie 4.5, Baidu's first native multimodal large model, excels in multimodal understanding and logical reasoning, outperforming GPT-4.5 in various benchmark tests. Its API price is only 1% of GPT-4.5's, attracting significant attention from developers and businesses. Ernie 4.5 demonstrates remarkable advancements in multimodal understanding, showcasing...

AI Daily: Manus Invitation Codes Resell for 50,000; Alibaba Open-Sources QwQ-32B Inference Model; GPT-4.5 Released to All ChatGPT Plus Users

Welcome to the [AI Daily] column! Your daily guide to exploring the world of artificial intelligence. We bring you the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI product applications. Discover new AI products: https://top.aibase.com/ 1. Global First General-Purpose AI Agent Manus Invitation Codes Resell for 50,000 The world's first general-purpose AI agent, Manus, has attracted widespread attention in the tech world. Granite3.2 also introduces chain-of-thought capabilities...

OpenAI Announces Phased Rollout of GPT-4.5 to All ChatGPT Plus Users

OpenAI has announced the phased rollout of its latest and largest AI model, GPT-4.5, to ChatGPT Plus users. According to the company, ChatGPT Plus subscribers will gain access to the model over the next one to three days. OpenAI CEO Sam Altman stated that initial user access will be controlled due to capacity constraints to manage expectations. GPT-4.5 is OpenAI's...

GPT-4.5 Cost Surge, Limited Performance Gains: OpenAI Faces Value Proposition Challenge

A recent report by The Decoder on OpenAI's latest model, GPT-4.5, has sparked industry debate regarding its cost-effectiveness. While official data shows performance improvements over GPT-4, the cost has significantly increased. Specifically, GPT-4.5 outperforms GPT-4 in various aspects, including a 63.2% improvement in professional queries, a 57% improvement in everyday queries, and...

AI Daily: OpenAI Releases Largest and Most Expensive Model GPT-4.5; Baidu's Ernie 4.5 Launch Scheduled for March 16; ByteDance Integrates Claude 3.7 into AI Programming Tool Trae

Welcome to the AI Daily column! Your daily guide to exploring the world of artificial intelligence. We present you with the hottest AI topics, focusing on developers and helping you understand technology trends and innovative AI product applications. Discover new AI products here: https://top.aibase.com/ 1. OpenAI Officially Releases GPT-4.5, Initially Available to ChatGPT Pro Users OpenAI officially released its latest AI model, GPT-4.5, on February 28th...