Recently, the multi-modal large model, Shusheng Pu Yu Ling Bi (InternLM-XComposer), has been upgraded to version 2.5. Developed by the Shanghai Artificial Intelligence Laboratory, this model has revolutionized text-image understanding and creative applications with its exceptional long context input and output capabilities.

IXC-2.5 can effortlessly handle contexts as long as 96K, thanks to its training with 24K interleaved image-text data. This long context capability makes IXC-2.5 excel in tasks requiring extensive input and output contexts.

image.png

Compared to the previous version, IXC-2.5 has undergone three significant upgrades in visual-linguistic understanding:

Ultra-high Resolution Understanding: IXC-2.5 supports high-resolution images of any aspect ratio through its native 560×560 ViT visual encoder.

Fine-grained Video Understanding: Treating videos as ultra-high resolution composite images consisting of dozens to hundreds of frames, capturing details through dense sampling and higher resolution.

Multi-round Multi-image Dialogue: Enabling free-form multi-round multi-image dialogues for natural human interaction.

image.png

In addition to enhanced understanding, IXC-2.5 has expanded into two notable applications using additional LoRA parameters for text-image creation:

Webpage Creation: Based on text-image instructions, IXC-2.5 can write HTML, CSS, and JavaScript source code to create webpages.

Writing High-quality Illustrated Articles: Utilizing specially designed Chain-of-Thought (CoT) and Direct Preference Optimization (DPO) technologies to significantly improve the quality of written content.

IXC-2.5 was evaluated in 28 benchmark tests, surpassing existing open-source state-of-the-art models in 16 of them. Additionally, it performed comparably or better than GPT-4V and Gemini Pro in 16 key tasks. This achievement fully demonstrates IXC-2.5's powerful performance and wide-ranging application potential.

Paper Link: https://arxiv.org/pdf/2407.03320

Project Link: https://github.com/InternLM/InternLM-XComposer