2024-07-31 17:56:44.AIbase.10.7k
Shusheng · Puyu Lingbi Multimodal Model Upgrade Version 2.5 Supports Longer Contexts and Image-Video Understanding Comparable to GPT-4V
Shusheng · Puyu Lingbi (InternLM-XComposer) Version 2.5 was developed by the Shanghai Artificial Intelligence Laboratory, focusing on long context input and output capabilities, operating smoothly within a length of 96K, and trained with 24K interleaved image-text data. Key upgrades include: high-resolution image understanding, fine-grained video understanding, and multi-turn multi-image dialogue. In application, it can create web pages and write high-quality text-image articles. Evaluations show it surpasses state-of-the-art open-source models across 16 benchmark tests and performs at par with key tasks compared to GPT-4V and Gem.