2023-10-10 14:14:44.AIbase.1.9k
Tsinghua Tang Jie & Zhipu AI's CogVLM-17B: A Domestic Multimodal Model Challenging GPT-4V
The CogVLM-17B, developed through a collaboration between Tsinghua University and Zhipu AI, is a domestic multimodal model with outstanding performance. CogVLM-17B can not only recognize objects in images but also distinguish between fully visible and partially visible objects. The model employs a unique deep fusion method, achieving deep alignment of image features and text features through four key components. CogVLM-17B outperforms Google's models in various fields and is aptly referred to as the '14-sided warrior', showcasing its multimodal capabilities.