Tsinghua Tang Jie & Zhipu AI's CogVLM-17B: A Domestic Multimodal Model Challenging GPT-4V

站长之家

Published inAI News · 1 min read · Oct 10, 2023

321

Translated data: Tsinghua University, in collaboration with Zhipu AI, has developed the domestic multimodal model CogVLM-17B, which demonstrates exceptional performance. This model is capable of identifying objects within images and distinguishing between fully visible and partially visible objects. CogVLM-17B employs a unique deep fusion method, achieving deep alignment of image and text features through four key components. The model has outperformed Google's models in multiple fields, earning it the nickname "14-sided warrior," showcasing remarkable multimodal processing capabilities. This domestic multimodal model offers new insights and possibilities for technological research in the multimodal field.

multimodal model domestic model CogVLM-17B

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Jul 12, 2025

340

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

Jul 11, 2025

160

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

1. Zhipu launches free AI Slides for PPT generation. 2. Keling AI introduces KeTu 2.1 with 180 styles. 3. NVIDIA's DiffusionRenderer enables 3D scene editing. 4. Modao AI offers 30-second prototype generation. 5. Higgsfield creates avatars from 10 photos. 6. Google open-sources GenAI Processors. 7. Google Veo3 adds image-to-video. 8. Mistral AI releases Devstral2507 for code generation.....

Jul 11, 2025

Microsoft BioEmu Model Dramatically Shortens Protein Simulation Time

Jul 11, 2025

150

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Jul 11, 2025

Kling AI Releases KTu 2.1 Model: Significant Improvement in Image Generation Capabilities, Supports 180 Styles

Jul 11, 2025

110

Keling AI Launches Keltu 2.1 Model, Will Be Free for All Members for 7 Days

KeLing AI launched KeTu 2.1, enhancing instruction understanding, portrait beautification, cinematic effects, and 180+ style responses. It also improves text generation. Free for all members for 7 days, supporting text-to-image, single/multi-image reference.....

Jul 10, 2025

210

vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

vivo launches BlueLM-2.5-3B, a 2.9B parameter multimodal model excelling in GUI understanding, text processing, and logical reasoning. It features dual thinking modes and efficient training for cost-effective deployment.....

Jul 10, 2025

140

Meta Hires Apple AI Model Head for Over 200 Million USD

Apple's AI head Ruoming Pang joins Meta's AI lab with a $200M+ package, surpassing Tim Cook's pay. Apple didn't counter, restructuring its team instead. This high-profile move may trigger departures, highlighting Meta's aggressive AI talent strategy.....

Jul 10, 2025

220

Google's Medical AI Model MedGemma Series Released, Can Run on a Single GPU

Google released HAI-DEF toolkit with MedGemma (medical text/records) and MedSigLIP (image analysis) models. Supports local GPU deployment, privacy protection, and customization via Hugging Face/Vertex AI.....

Jul 10, 2025

170

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Tsinghua Tang Jie & Zhipu AI's CogVLM-17B: A Domestic Multimodal Model Challenging GPT-4V

站长之家

This article is from AIbase Daily

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Tencent Hunyuan-A13B Model API Launches

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

Microsoft BioEmu Model Dramatically Shortens Protein Simulation Time

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Kling AI Releases KTu 2.1 Model: Significant Improvement in Image Generation Capabilities, Supports 180 Styles

Keling AI Launches Keltu 2.1 Model, Will Be Free for All Members for 7 Days

vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

Meta Hires Apple AI Model Head for Over 200 Million USD

Google's Medical AI Model MedGemma Series Released, Can Run on a Single GPU