llm-colosseum
Evaluate large language models through Street Fighter 3 battles.
CommonProductProgrammingArtificial IntelligenceBenchmarking
llm-colosseum is an innovative benchmarking tool that uses the game Street Fighter 3 to assess the real-time decision-making capabilities of large language models (LLMs). Unlike traditional benchmarking methods, this tool tests the models' quick responses, intelligent strategies, creative thinking, adaptability, and resilience through simulated real game scenarios.
llm-colosseum Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29