Zhipu AI Open-Source Visual Language Model CogAgent Supports GUI Graphic Interface Q&A
站长之家
144
Translated data:
Zhipu AI has open-sourced CogAgent, a vision-language model with a parameter scale of 18 billion. CogAgent excels in GUI understanding and navigation, achieving state-of-the-art (SOTA) general performance on multiple benchmarks. The model supports high-resolution visual inputs and conversational question-answering, and can answer questions about any GUI screenshot. Additionally, CogAgent supports OCR-related tasks, with significant enhancements in capabilities through pre-training and fine-tuning. Users can upload screenshots for task inference and receive information on plans, next actions, and specific coordinates for operations.
© Copyright AIbase Base 2024, Click to View Source - https://www.aibase.com/news/4379