llm-colosseum is an innovative benchmarking tool that uses the game Street Fighter 3 to assess the real-time decision-making capabilities of large language models (LLMs). Unlike traditional benchmarking methods, this tool tests the models' quick responses, intelligent strategies, creative thinking, adaptability, and resilience through simulated real game scenarios.