On September 4, 2024, the Beijing Academy of Artificial Intelligence (BAAI) announced the launch of the world's first model battle evaluation service that includes text-to-video capabilities—the FlagEval Large Model Arena.

This service is open to users, covering approximately 40 large models from both domestic and international sources, and supports custom online or offline evaluations for four major tasks: language question answering, multimodal image-text understanding, text-to-image, and text-to-video. The introduction of the FlagEval Large Model Arena not only provides evaluations for preset questions such as simple understanding, knowledge application, coding ability, and reasoning ability, but also introduces a subjective preference ladder scoring system for more precise revelation of model performance differences.

WeChat Screenshot_20240905084138.png

The service conducts evaluations anonymously to ensure the fairness of the process. Users can participate in the evaluation through the web portal or the first mobile access point in China, experiencing efficient model battle evaluations. The scoring results of the FlagEval Large Model Arena will be immediately publicized, forming an arena leaderboard to showcase the battle capabilities of each model.

The BAAI stated that it will open-source the entire chain of data for model battle evaluations to promote the development of the large model evaluation ecosystem. The launch of the FlagEval Large Model Arena further expands BAAI's technical layout and tool development in the field of model evaluation, providing new testing and evaluation tools for research and application in the field of artificial intelligence.

Experience URL:https://flageval.baai.ac.cn/#/home