InternVL 2.5 is a series of advanced multimodal large language models (MLLM) that has evolved from InternVL 2.0, enhanced through significant training and testing strategy improvements as well as better data quality. This model series is optimized for visual perception and multimodal capabilities, supporting various functionalities, including transforming images and texts, making it suitable for complex tasks that involve visual and linguistic information.