InternVL2.5-MPO is an advanced series of multimodal large language models built on InternVL2.5 and mixed preference optimization. This model integrates the incrementally pre-trained InternViT and various large language models such as InternLM 2.5 and Qwen 2.5, employing a randomly initialized MLP projector. It supports processing multiple images and video data, excelling in multimodal tasks by understanding and generating text related to images.