Tsinghua University Develops New Visual Language Model CogAgent to Enhance GUI Understanding and Navigation

站长之家

Published inAI News · 1 min read · Dec 27, 2023

106

The Tsinghua University Zhipu AI team has introduced CogAgent, a vision-language model focused on enhancing the understanding and navigation of graphical user interfaces (GUIs). Utilizing a dual-encoder system to handle complex GUI elements, the model excels in processing high-resolution inputs, navigating GUIs on both PC and Android platforms, and performing tasks involving text and visual question-answering. Potential applications of CogAgent include automating GUI operations, providing GUI assistance and guidance, and driving new GUI designs and interaction methods. Although still in its early development stages, the model is expected to bring significant changes to computer interaction methods.

Tsinghua University CogAgent Visual Language Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

The ByteDance Seed team recently announced the launch of the 3D generation large model Seed3D1.0, which is capable of generating high-quality, realistic 3D models from a single image in an end-to-end manner, including detailed geometry, realistic textures, and physically based rendering (PBR) materials. This innovative achievement is expected to provide powerful world simulation support for the development of embodied intelligence, addressing bottlenecks in physical interaction capabilities and content diversity in current technologies. During the development process, the Seed team collected and processed a large amount of high-quality 3D data, building a complete three

Oct 23, 2025

290

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

MiniMax's Hailuo2.3 video generation model achieves breakthroughs in realism, precision, and style diversity, enhancing motion capture to solidify its industry leadership after surpassing Google Veo3.....

Oct 23, 2025

780

Doubao Video Generation Model Seedance 1.0 Pro Launches First and Last Frame Capabilities

Volcano Engine has officially launched the first and last frame capabilities of Doubao-Seedance-1.0-pro, a video generation model from Doubao. This update marks an important step forward in controllability and consistency in AI video creation. With technical advantages such as subject consistency in complex scenes, physical plausibility of large movements, and intelligent video rhythm reasoning, Seedance 1.0 Pro will significantly enhance the main character tracking effect in generated videos, achieve precise narrative guidance, and produce more immersive and expressive video content.

Oct 23, 2025

170

Alibaba's C Plan Debut: Quark Dialogue Assistant Launches, Using Qwen Model to Capture the C-End AI Access Point

Alibaba's 'Project C' launches Quark App's AI assistant using the latest Tongyi Qianwen model, targeting young users to enhance its C-end ecosystem. Accessible via homepage tap or swipe.....

Oct 23, 2025

160

AI Video Realizes Vertical Domains! Runway Opens Model Fine-tuning Permissions, Focusing on Robotics and Construction

Runway launches a video model fine-tuning tool for partners to customize AI models in verticals like robotics and education, enhancing performance with less data and computation.....

Oct 23, 2025

120

Addressing Model Inference Flaws: Apple's MIND Team Accelerates Hiring of AI Talent

Apple is hiring experts in reasoning models to address major LLM flaws, focusing on developing new architectures for enhanced reasoning, planning, tool use, and agent-based capabilities.....

Oct 23, 2025

140

Hunyuan World Model 1.1 Officially Released: Revolutionary 3D Reconstruction Technology, High-Quality Scene Generation in Seconds

Tencent's open-source Hunyuan World Model 1.1 supports multi-view and video inputs, enables single-GPU deployment, and accelerates generation. It creates professional 3D scenes from videos or images in seconds, making advanced 3D reconstruction accessible to general users.....

Oct 22, 2025

620

iFlytek 11月6日重磅发布: Spark Large Model Fully Upgraded

iFlytek hosts its Global 1024 Developer Festival in Hefei on Nov 6, featuring Spark Model upgrades with enhanced base capabilities and multimodal interactions. Online events began Oct 24, drawing wide developer attention.....

Oct 22, 2025

120

BaiChuan Launches Innovative Medical Large Model M2Plus, Significantly Reducing Medical Hallucination Rate

BaiChuan Large Model released the medical large model Baichuan-M2Plus, upgraded BaiXiaoYing, and opened API interfaces. Evaluation results show that the model's medical hallucination rate is significantly lower than that of general large models, about three times lower than DeepSeek, and performs better than the US application OpenEvidence.

Oct 22, 2025

190

AI Daily: OpenAI Releases Browser Atlas; Tongyi Qwen3-VL Adds Two Model Sizes, 2B and 32B; Baidu Launches Recurrent Evidence Enhancement Large Model

OpenAI launches ChatGPT Atlas browser with integrated AI assistant, challenging Chrome. Features Agent mode for smart interactions per tab, expanding from chat tool to internet platform.....

Oct 22, 2025

130

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Tsinghua University Develops New Visual Language Model CogAgent to Enhance GUI Understanding and Navigation

站长之家

This article is from AIbase Daily

AI News Recommendations

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

Doubao Video Generation Model Seedance 1.0 Pro Launches First and Last Frame Capabilities

Alibaba's C Plan Debut: Quark Dialogue Assistant Launches, Using Qwen Model to Capture the C-End AI Access Point

AI Video Realizes Vertical Domains! Runway Opens Model Fine-tuning Permissions, Focusing on Robotics and Construction

Addressing Model Inference Flaws: Apple's MIND Team Accelerates Hiring of AI Talent

Hunyuan World Model 1.1 Officially Released: Revolutionary 3D Reconstruction Technology, High-Quality Scene Generation in Seconds

iFlytek 11月6日重磅发布: Spark Large Model Fully Upgraded

BaiChuan Launches Innovative Medical Large Model M2Plus, Significantly Reducing Medical Hallucination Rate

AI Daily: OpenAI Releases Browser Atlas; Tongyi Qwen3-VL Adds Two Model Sizes, 2B and 32B; Baidu Launches Recurrent Evidence Enhancement Large Model

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Tsinghua University Develops New Visual Language Model CogAgent to Enhance GUI Understanding and Navigation

站长之家

This article is from AIbase Daily

AI News Recommendations

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

Doubao Video Generation Model Seedance 1.0 Pro Launches First and Last Frame Capabilities

Alibaba's C Plan Debut: Quark Dialogue Assistant Launches, Using Qwen Model to Capture the C-End AI Access Point

AI Video Realizes Vertical Domains! Runway Opens Model Fine-tuning Permissions, Focusing on Robotics and Construction

Addressing Model Inference Flaws: Apple's MIND Team Accelerates Hiring of AI Talent

Hunyuan World Model 1.1 Officially Released: Revolutionary 3D Reconstruction Technology, High-Quality Scene Generation in Seconds

iFlytek 11月6日重磅发布: Spark Large Model Fully Upgraded

BaiChuan Launches Innovative Medical Large Model M2Plus, Significantly Reducing Medical Hallucination Rate

AI Daily: OpenAI Releases Browser Atlas; Tongyi Qwen3-VL Adds Two Model Sizes, 2B and 32B; Baidu Launches Recurrent Evidence Enhancement Large Model

GEO Services