Tsinghua Tang Jie & Zhipu AI's CogVLM-17B: A Domestic Multimodal Model Challenging GPT-4V

站长之家
304
Translated data:
Tsinghua University, in collaboration with Zhipu AI, has developed the domestic multimodal model CogVLM-17B, which demonstrates exceptional performance. This model is capable of identifying objects within images and distinguishing between fully visible and partially visible objects. CogVLM-17B employs a unique deep fusion method, achieving deep alignment of image and text features through four key components. The model has outperformed Google's models in multiple fields, earning it the nickname "14-sided warrior," showcasing remarkable multimodal processing capabilities. This domestic multimodal model offers new insights and possibilities for technological research in the multimodal field.
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/1927