Translated data: Teams including Tsinghua University have released AgentBench, the first systematic benchmark test for AI agent systems, comprehensively evaluating 25 different language models. The research results show that GPT-4 performs exceptionally well in complex environments, with significant advantages observed between top commercial language models and open-source models. The research team suggests enhancing the learning capabilities of open-source models further.