InternVL 2.5
Open-source multimodal large language model series
CommonProductProductivitymultimodallarge language model
InternVL 2.5 is an advanced multimodal large language model series based on InternVL 2.0. While maintaining the core model architecture, it introduces significant enhancements in training and testing strategies as well as data quality. This model explores the relationship between model scalability and performance, systematically investigating performance trends across visual encoders, language models, dataset sizes, and test settings. Comprehensive evaluations across a wide range of benchmarks, including interdisciplinary reasoning, document understanding, multi-image/video comprehension, real-world understanding, multimodal hallucination detection, visual localization, multilingual capabilities, and pure language processing, demonstrate InternVL 2.5's competitiveness comparable to leading commercial models like GPT-4o and Claude-3.5-Sonnet. Notably, it is the first open-source MLLM to achieve over 70% on the MMMU benchmark, attaining a 3.7 percentage point improvement through Chain of Thought (CoT) reasoning, showcasing strong potential for scalability during testing.
InternVL 2.5 Visit Over Time
Monthly Visits
21315886
Bounce Rate
45.50%
Page per Visit
5.2
Visit Duration
00:05:02