Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

Datasets

AI Compute

AI Tutorial

Zhipu AI Open-Source Visual Language Model CogAgent Supports GUI Graphic Interface Q&A

站长之家

Published inAI News · 1 min read · Dec 21, 2023

166

Translated data: Zhipu AI has open-sourced CogAgent, a vision-language model with a parameter scale of 18 billion. CogAgent excels in GUI understanding and navigation, achieving state-of-the-art (SOTA) general performance on multiple benchmarks. The model supports high-resolution visual inputs and conversational question-answering, and can answer questions about any GUI screenshot. Additionally, CogAgent supports OCR-related tasks, with significant enhancements in capabilities through pre-training and fine-tuning. Users can upload screenshots for task inference and receive information on plans, next actions, and specific coordinates for operations.

Visual Language Model GUI Graphic Interface Q&A Open Source

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Shanghai AI Lab Launches Lumina-DiMOO, Pioneering a New Era in Multimodal Generation and Understanding

Title: Shanghai AI Laboratory Launches Lumina-DiMOO,

Sep 16, 2025

Redefining Tradition! Mini-o3 Open-Source Model Achieves Ultra-Long Visual Reasoning, Deep Thinking Is No Longer a Challenge

Recently, ByteDance and the University of Hong Kong jointly launched a new open-source visual reasoning model - Mini-o3, marking another major breakthrough in multi-turn visual reasoning technology. Unlike previous visual language models (VLMs) that could only conduct 1-2 rounds of dialogue, Mini-o3 limited the number of dialogue rounds to 6 during training, but during testing it can extend the reasoning rounds to dozens, greatly enhancing the ability to handle visual questions. The strength of Mini-o3 lies in its deep reasoning in high-difficulty visual search tasks, reaching

Sep 16, 2025

OpenAI Evals Adds Native Audio Evaluation Features to Simplify Speech Model Testing

OpenAI is further expanding the capabilities of its Evals tool, bringing native audio input and audio scoring support for developers. This update means that models' audio responses can now be evaluated directly without first converting them into text. This new feature greatly simplifies the evaluation process for speech recognition and speech generation models. With native audio support in Evals, developers can more efficiently test and optimize their audio applications. Users just need to upload an audio file to perform performance evaluations directly on the platform, which not only reduces

Sep 16, 2025

Swiss Technological Breakthrough: Apertus Open-Source Model Challenges AI's Black Box, Fully Discloses Training Details

Swiss research institutions are collaborating to challenge the 'black box' status of large language models. The Ecole Polytechnique Fédérale de Lausanne (EPFL), the Swiss Federal Institute of Technology Zurich (ETH Zurich), and the Swiss National Supercomputing Centre (CSCS) recently launched a large-scale open-source language model called Apertus. The model not only reflects its core philosophy in its name—'open' in Latin—but also puts 'openness' to the extreme in practice. Unlike OpenAI's GPT series and Meta

Sep 16, 2025

Swiss Three Giants Jointly Launch Open-Source Large Model Apertus to Challenge American Tech Giants

Recently, the Ecole Polytechnique Fédérale de Lausanne (EPFL), ETH Zurich, and the Swiss National Supercomputing Centre (CSCS) jointly launched a large-scale open-source language model called 'Apertus'. The name of the model means 'open' in Latin, and its development philosophy reflects this spirit. Unlike current market offerings such as OpenAI's GPT series, Meta's Llama, and Anthropic's Claude

Sep 16, 2025

Unlock the Power of LLM Agents! Anthropic Releases a Guide to Writing Tools to Master Large Language Models!

Recently, the AI company Anthropic published a significant guide titled "Writing Effective Tools for LLM Agents - Using LLM Agents" on its official blog. This guide elaborates on how to design efficient tools for large language model (LLM) agents using the Model Context Protocol (MCP), providing a systematic prototype - evaluation

Sep 15, 2025

160

Musk's xAI Launches Grok 4 Fast: Ten Times Faster but with Some Compromises on Details

According to testingcatalog, recently, xAI officially launched Grok4Fast, a new model that users can access through the model selector on the Grok website. To experience this new feature, users need to enable a new early access mode toggle button in the subscription settings. The biggest highlight of Grok4Fast is its speed, with user feedback indicating that the response speed of this model is up to ten times faster than the standard Grok4. The main feature of this version is optimization for speed.

Sep 15, 2025

190

Free! Genspark AI Browser Released: Supports Running Open Source Models Locally

Recently, the Genspark AI Browser was officially released, and the official claims it is the world's first AI browser that supports running open source models locally. The special feature of this browser is that users can run up to 169 open source models, including GPT-OSS and Gemma3, directly on their local devices without an internet connection. Using this browser, users can enjoy extremely fast response speeds and it is completely free. The Genspark AI Browser not only integrates edge AI large models,

Sep 15, 2025

170

AI Daily: Xiaohongshu Launches Dialogue Synthesis Model FireRedTTS-2; Baidu Wenxin New Model Tops Hugging Face; xAI to Lay Off 500 People

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Fresh AI products click to learn more: https://app.aibase.com/zh1, Xiaohongshu launches its next-generation dialogue synthesis model FireRedTTS-2, helping with AI podcast production. FireRedTTS-2 is the new model developed by Xiaohongshu's intelligent audio technology team.

Sep 15, 2025

110

Shanghai Accelerates the Application of AI Technology in the Medical Equipment Field, Promoting the Development of the High-End Industry Chain

The AI open source ecosystem is undergoing an unprecedented transformation. Ant Group released the second version of its large model open source development panorama and trends at the Bund Conference, which acts as a mirror, clearly reflecting the true state of this rapidly evolving field. The creation of this panorama is not simply a collection of data, but the result of careful selection through a rigorous OpenRank evaluation system. The research team set the threshold at an OpenRank score above 50, evaluating the relative influence of projects by analyzing their collaboration relationships, and ultimately selecting from the vast open source landscape.

Sep 15, 2025

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

Zhipu AI Open-Source Visual Language Model CogAgent Supports GUI Graphic Interface Q&A

站长之家

This article is from AIbase Daily

AI News Recommendations

Shanghai AI Lab Launches Lumina-DiMOO, Pioneering a New Era in Multimodal Generation and Understanding

Redefining Tradition! Mini-o3 Open-Source Model Achieves Ultra-Long Visual Reasoning, Deep Thinking Is No Longer a Challenge

OpenAI Evals Adds Native Audio Evaluation Features to Simplify Speech Model Testing

Swiss Technological Breakthrough: Apertus Open-Source Model Challenges AI's Black Box, Fully Discloses Training Details

Swiss Three Giants Jointly Launch Open-Source Large Model Apertus to Challenge American Tech Giants

Unlock the Power of LLM Agents! Anthropic Releases a Guide to Writing Tools to Master Large Language Models!

Musk's xAI Launches Grok 4 Fast: Ten Times Faster but with Some Compromises on Details

Free! Genspark AI Browser Released: Supports Running Open Source Models Locally

AI Daily: Xiaohongshu Launches Dialogue Synthesis Model FireRedTTS-2; Baidu Wenxin New Model Tops Hugging Face; xAI to Lay Off 500 People

Shanghai Accelerates the Application of AI Technology in the Medical Equipment Field, Promoting the Development of the High-End Industry Chain

GEO Services