InternVL, by extending the ViT model to 6 billion parameters and aligning with the language model, has constructed the largest open-source visual basic model currently available, a 14B model, which has achieved state-of-the-art performance in a wide range of tasks including visual perception, cross-modal retrieval, and multimodal dialogue, with 32 published papers demonstrating its excellence.