Multi-modal Large Language Models
Provides a comprehensive evaluation of MLLMs
CommonProductProductivityMLLMsEvaluation Tool
This tool aims to assess the generalization ability, trustworthiness, and causal reasoning abilities of the latest proprietary and open-source MLLMs through qualitative research from four modalities: text, code, images, and videos. This is done to increase the transparency of MLLMs. We believe these attributes are representative factors defining the reliability of MLLMs, supporting various downstream applications. Specifically, we evaluated closed-source GPT-4 and Gemini, as well as 6 open-source LLMs and MLLMs. Overall, we evaluated 230 manually designed cases, with qualitative results summarized into 12 scores (i.e., 4 modalities multiplied by 3 attributes). In total, we revealed 14 empirical findings that contribute to understanding the capabilities and limitations of proprietary and open-source MLLMs, enabling more reliable support for multi-modal downstream applications.
Multi-modal Large Language Models Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32