The team from Shanghai Artificial Intelligence Laboratory's OpenCompass and ModelScope recently announced significant updates to their large model evaluation platform, Compass Arena, introducing a new multi-modal large model competition section called Compass Multi-Modal Arena. This new section provides users with a platform to experience and compare the performance of several mainstream multi-modal large models, helping them find the model that best suits their needs.

WeChat Screenshot_20240813080725.png

The official website of Compass Multi-Modal Arena and the ModelScope page are now open to the public, offering a user-friendly interface where users can upload images and input questions. The system will then arrange two anonymous multi-modal large models to generate answers based on the input content. Users can subjectively evaluate the quality of the generated content and choose the model they believe performs better. After the evaluation, users can see the names of each model.

WeChat Screenshot_20240813080734.png

The platform also features a built-in question bank, designed for use when users are unable to upload images. This question bank focuses on subjective visual question-answering tasks, such as meme understanding, art appreciation, and photography appreciation. This design aims to assess the performance and user experience of multi-modal large models on subjective tasks.

Compass Multi-Modal Arena Official Website

https://opencompass.org.cn/arena?type=multimodal

ModelScope Page:

https://modelscope.cn/studios/opencompass/CompassArena

HuggingFace Page

https://huggingface.co/spaces/opencompass/CompassArena

OpenCompass Multi-Modal Evaluation Tool Open Source Link:

https://github.com/open-compass/VLMEvalKit