Cohere Releases New Multimodal AI Model Aya Vision in 32B and 8B Versions

AIbase基地

Published inAI News · 5 min read · Mar 6, 2025

Cohere, an AI startup, released Aya Vision, a multimodal "open" AI model, this week through its non-profit research lab. The lab claims the model is industry-leading.

Aya Vision performs multiple tasks, including image captioning, answering questions about photos, translating text, and generating summaries in 23 major languages. Cohere is making Aya Vision freely available via WhatsApp, aiming to make this technological breakthrough more accessible to researchers worldwide.

Cohere notes in its blog that while AI has made significant progress, a large gap remains in model performance across different languages, especially in multimodal tasks involving text and images. "Aya Vision aims to help bridge this gap."

Aya Vision comes in two versions: Aya Vision 32B and Aya Vision 8B. The more advanced Aya Vision 32B, dubbed a "new frontier," outperforms models twice its size in some visual understanding benchmarks, including Meta's Llama-3.290B Vision. Meanwhile, Aya Vision 8B surpasses some models ten times its size in certain evaluations.

Both models are available on the AI development platform Hugging Face under a Creative Commons 4.0 license, subject to Cohere's acceptable use addendum and are not for commercial use.

Cohere states that Aya Vision was trained using a "diverse" English dataset, which the lab translated and then used synthetic annotations for training. Synthetic annotations are AI-generated labels that help the model understand and interpret data during training. While synthetic data has potential drawbacks, competitors like OpenAI are increasingly using it to train models.

Cohere points out that training Aya Vision with synthetic annotations allowed them to reduce resource usage while still achieving competitive performance. "This showcases our commitment to efficiency, achieving more with fewer computational resources."

To further support the research community, Cohere also released a new benchmark suite, AyaVisionBench, designed to test models' capabilities in combined vision and language tasks, such as identifying differences between two images and converting screenshots to code.

The AI industry currently faces a so-called "evaluation crisis," largely stemming from the widespread use of popular benchmarks whose overall scores poorly correlate with the capabilities relevant to most AI users' tasks. Cohere claims AyaVisionBench offers a "broad and challenging" framework for evaluating models' cross-lingual and multimodal understanding.

Official blog: https://cohere.com/blog/aya-vision

Key Highlights:
🌟 Aya Vision, touted by Cohere as industry-best, performs various language and vision tasks.
💡 Aya Vision comes in 32B and 8B versions, outperforming larger competitor models.
🔍 Cohere also released a new benchmark, AyaVisionBench, to address AI model evaluation challenges.

Former OpenAI CTO's Startup Adds Key OpenAI Alumni

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, has added two heavyweight advisors from her former employer: former OpenAI Chief Scientist Bob McGrew and former OpenAI researcher Alec Radford. Their contributions bring significant expertise and energy to the company, which is focused on developing AI that meets individual needs.

OpenAI's New Image Generator Sparks Controversy; CEO Altman Rebuts Miyazaki's Criticism

OpenAI recently released a new image generator that has drawn attention for its ability to produce illustrations mimicking the style of Studio Ghibli, but criticized by some users for lacking "soul." Meanwhile, OpenAI CEO Sam Altman has fiercely rebutted critics, including Studio Ghibli co-founder Hayao Miyazaki. Altman responded to Miyazaki's harsh criticism during an interview with tech founder and YouTuber Arun Mayya, openly refuting...

How Publishers Can Monetize AI: Exploring Revenue Models and Their Pros and Cons

With the rapid advancement of AI technology, publishers are seeking new revenue streams to adapt to this evolving market. Recently, major publishers have been forging partnerships with AI companies, exploring diverse revenue models including content licensing agreements and ad revenue sharing. These new business models offer publishers potential profit opportunities but also present complex relationships with AI companies. Content licensing agreements are one of the most common collaborations between publishers and AI companies. Publishers license their content to AI companies...

OpenAI Countersues Musk, Seeking Cease and Desist of Illegal and Unfair Actions

The legal battle between OpenAI and its co-founder, billionaire Elon Musk, continues to escalate. OpenAI has filed a countersuit, seeking a court order to prevent further illegal and unfair actions by Musk and demanding compensation for damages incurred. In the countersuit, OpenAI's lawyers stated: "OpenAI is a resilient company, but Musk's actions have caused us harm. If his attacks continue, OpenAI's mission and the public interest will be further threatened."

OpenAI Launches Pioneers Program to Redefine AI Model Evaluation

OpenAI has announced the launch of its 'OpenAI Pioneers Program', aimed at improving the current scoring system for AI models to create evaluation standards more aligned with real-world applications. With the rapid advancement of AI across various industries, understanding and enhancing AI's performance in real-world scenarios is crucial. OpenAI states that focusing on domain-specific evaluation metrics will more effectively reflect real-world performance and help teams assess model performance in high-stakes environments.

Following OpenAI, Google Gemini Joins MCP Initiative to Accelerate AI Agent Interoperability

Just weeks after OpenAI announced its adoption of a rival Anthropic standard for connecting AI models to the systems where their data resides, Google has followed suit. Google DeepMind CEO Demis Hassabis announced on X on Wednesday that Google will add support for the Anthropic Model Context Protocol (MCP) to its Gemini models and software development kits (SDKs).

iFlytek's Starfire X1 Receives Major Upgrade, Rivaling OpenAI and DeepSeek!

At the AI Boundless Innovation Global Launch Conference held in Shanghai, iFlytek announced a significant upgrade to its deep reasoning model, Starfire X1. iFlytek's senior vice president, Yu Jidong, revealed that this upgrade will further enhance Starfire X1's performance in reasoning, text generation, and language understanding, making it comparable to industry-leading models like OpenAI's o1 and DeepSeek's R1. Initially launched in January 2025, Starfire X1 distinguishes itself with its training based on entirely domestic computing power.