InternVL2_5-26B is an advanced multimodal large language model (MLLM) developed based on InternVL 2.0. It has been further enhanced through significant training and testing strategies, as well as improvements in data quality. The model retains the core architecture of its predecessor, the 'ViT-MLP-LLM', while integrating the newly pre-trained InternViT along with various pre-trained large language models (LLMs) such as InternLM 2.5 and Qwen 2.5, utilizing randomly initialized MLP projectors. The InternVL 2.5 series models demonstrate exceptional performance in multimodal tasks, particularly in visual perception and multimodal capabilities.