InternVL 2.5

Open-source multimodal large language model series

CommonProductProductivitymultimodallarge language model
InternVL 2.5 is an advanced multimodal large language model series based on InternVL 2.0. While maintaining the core model architecture, it introduces significant enhancements in training and testing strategies as well as data quality. This model explores the relationship between model scalability and performance, systematically investigating performance trends across visual encoders, language models, dataset sizes, and test settings. Comprehensive evaluations across a wide range of benchmarks, including interdisciplinary reasoning, document understanding, multi-image/video comprehension, real-world understanding, multimodal hallucination detection, visual localization, multilingual capabilities, and pure language processing, demonstrate InternVL 2.5's competitiveness comparable to leading commercial models like GPT-4o and Claude-3.5-Sonnet. Notably, it is the first open-source MLLM to achieve over 70% on the MMMU benchmark, attaining a 3.7 percentage point improvement through Chain of Thought (CoT) reasoning, showcasing strong potential for scalability during testing.
Visit

InternVL 2.5 Visit Over Time

Monthly Visits

20899836

Bounce Rate

46.04%

Page per Visit

5.2

Visit Duration

00:04:57

InternVL 2.5 Visit Trend

InternVL 2.5 Visit Geography

InternVL 2.5 Traffic Sources

InternVL 2.5 Alternatives