InternVL2_5-4B is an advanced multimodal large language model (MLLM) that maintains the core model architecture of InternVL 2.0 while significantly enhancing training and testing strategies and data quality. The model excels in handling tasks from image and text to text, particularly in multimodal reasoning, mathematical problem solving, OCR, and chart and document comprehension. As an open-source model, it provides researchers and developers with powerful tools to explore and build intelligent applications based on visual and linguistic elements.