InternVL2.5-MPO is an advanced series of large multimodal language models built on InternVL2.5 and Mixed Preference Optimization (MPO). This series excels in multimodal tasks, capable of processing image, text, and video data while generating high-quality text responses. The model employs a 'ViT-MLP-LLM' paradigm, optimizing visual processing capabilities through pixel unshuffle operations and dynamic resolution strategies. Furthermore, it supports multiple images and video data, further expanding its application scenarios. In multimodal capability assessments, InternVL2.5-MPO surpasses numerous benchmark models, affirming its leadership in the multimodal field.