Reddit User Experiment: GTP-4o Defeats Gemini 1.5 Pro in Chess

AIbase基地

Published inAI News · 3 min read · Aug 8, 2024

143

Recently, Reddit user @zefman conducted an intriguing experiment, creating a platform where various language models (LLMs) engage in real-time chess battles. The aim was to assess these models' performance in an entertaining and effortless manner.

It is well known that these models are not particularly adept at chess, yet @zefman found notable highlights worth observing in this experiment.

In this experiment, @zefman focused on several of the latest models, with GPT-4o standing out as the strongest contender. He also compared it with other models like Claude and Gemini, noting the intriguing differences in their thought processes and reasoning. Through this platform, viewers can see how each model analyzes the board behind every move.

@zefman's method of displaying the chess positions is quite straightforward. Each model receives identical prompts that include the current board state, the FEN notation, and their previous two moves. This ensures that each model's decisions are based on the same information, allowing for a fairer comparison.

Each model uses the exact same prompt, which updates with the ASCI, FEN board state, and their previous two moves and thoughts. Below is an example:

Additionally, @zefman noticed that in some cases, particularly with weaker models, they might repeatedly choose incorrect moves. To address this, he provided these models with five chances to reselect. If they still could not choose a valid move, a random valid move would be selected to keep the game going.

His conclusion: GPT-4o remains the strongest, defeating Gemini1.5pro in chess.

Key Points:

🌟 GPT-4o excels, becoming the strongest language model in the experiment.

♟️ The experiment allows different models to play chess in real-time, analyzing their thought processes.

🔄 Weaker models sometimes choose incorrect moves, providing an interesting interactive experience.

GPT-4o Claude Gemini LLM

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Kimi-2 Has Been Launched on LiveBench AI: The New Open-Source AI Champion Exceeds GPT-4.1

Kimi-2, an open-source 32B MoE AI model, excels in code generation and outperforms Claude Opus4/GPT-4.1. Priced at $0.15/million tokens, it ranks 3rd globally and is now available via API and Kimi app.....

Jul 16, 2025

150

ByteDance Seed's Latest Reinforcement Learning Recipe POLARIS Open Sourced, 4B Model's Mathematical Reasoning Approaches 235B Performance

Recently, the ByteDance Seed team collaborated with the University of Hong Kong and Fudan University to introduce an innovative reinforcement learning training method called POLARIS. This method successfully enhances the mathematical reasoning capabilities of small models to levels comparable to those of large models through a carefully designed Scaling RL strategy, offering a new approach for optimizing small models in the field of artificial intelligence. Experimental results show that the 4 billion parameter open-source model Qwen3-4B trained using POLARIS achieved remarkable performance on AIME25 and AIME24 mathematical tests.

Jul 16, 2025

150

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

Kimi K2 excels in creative writing, outperforming o3-Pro in short story creation. This open-source model by Moonshot (Ali-backed) shows strengths in literary compression and metaphor innovation, with some works near publishable quality. Its low cost ($0.15/M tokens) and precise instruction-following attract developers, though emotional depth and multilingual performance need improvement. This breakthrough sets new AI writing standards.....

Jul 16, 2025

160

TRAE Launches Kimi-K2 Model Service International Version Supports Grok-4 (Beta) Function Upgrade

TRAE.ai launches Kimi-K2 model and Grok-4(Beta). Kimi-K2 excels in code/math with MoE architecture, rivaling GPT-4.1. Easy 3-step access. International version adds Grok-4(Beta) testing alongside Claude, Gemini, GPT.....

Jul 16, 2025

150

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

Musk's AI model Grok4 excels in programming, creative tasks, and outperforms OpenAI o3 in 8 areas. It adapts explanations for all ages and shows potential to revolutionize work and life.....

Jul 15, 2025

130

Claude Major Upgrade! One-Click Link to MCP Tool Directory, AI Workflow Efficiency Soars

Claude AI introduces a major update with 'Apps & Tools Directory', enabling seamless AI-tool integration via MCP protocol. It supports both web and desktop MCP services for popular tools like Asana and GitHub, transforming Claude into a workflow platform.....

Jul 15, 2025

240

Amazon Launches AI Code Editor Kiro, Supporting Free Use of Claude 4/3.7 Sonnet

Amazon AWS launches a new AI development tool called Kiro, focusing on the concept of specification-driven development. The tool is based on the open-source Code OSS platform and is compatible with the VS Code ecosystem. It uses AI collaboration to first generate requirement documents and system designs, then automatically generates code, test cases, and documentation, ensuring code quality. Kiro supports multimodal input and automated testing features. It is currently available for free preview, and a paid version will be released in the future. Its specification-driven development model has the potential to address maintenance challenges with AI-generated code, but the initial usage may be complex.

Jul 15, 2025

170

MiniMax Valued Over 4 Billion USD, Backed by Shanghai State Capital, Joins the 3 Billion USD Large Model Club

Chinese AI firm MiniMax raised $300M, reaching a $4B valuation. Backed by Shanghai state capital, it's now one of China's two $3B+ LLM companies. Founded by ex-SenseTime executives, with prior investments from Alibaba and Tencent, it's reportedly preparing for a Hong Kong IPO.....

Jul 15, 2025

190

Gemini 2.5 Pro and CAMEL-AI Team Up to Achieve Data Automation and Visualization

CAMEL-AI partners with Google Gemini2.5Pro to develop automated data visualization solutions, leveraging Gemini's reasoning and OWL system for low-code analytics via natural language input.....

Jul 15, 2025

Google Gemini Embedding Model Tops MTEB Ranking, Surpassing OpenAI

Google released Gemini, the top embedding model with 68.37 MTEB score, surpassing OpenAI. Based on Transformer, it supports multilingual tasks at $0.15/M tokens, boosting AI applications like search.....

Jul 15, 2025

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Reddit User Experiment: GTP-4o Defeats Gemini 1.5 Pro in Chess

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Kimi-2 Has Been Launched on LiveBench AI: The New Open-Source AI Champion Exceeds GPT-4.1

ByteDance Seed's Latest Reinforcement Learning Recipe POLARIS Open Sourced, 4B Model's Mathematical Reasoning Approaches 235B Performance

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

TRAE Launches Kimi-K2 Model Service International Version Supports Grok-4 (Beta) Function Upgrade

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

Claude Major Upgrade! One-Click Link to MCP Tool Directory, AI Workflow Efficiency Soars

Amazon Launches AI Code Editor Kiro, Supporting Free Use of Claude 4/3.7 Sonnet

MiniMax Valued Over 4 Billion USD, Backed by Shanghai State Capital, Joins the 3 Billion USD Large Model Club

Gemini 2.5 Pro and CAMEL-AI Team Up to Achieve Data Automation and Visualization

Google Gemini Embedding Model Tops MTEB Ranking, Surpassing OpenAI