Alibaba International Launches Latest Multimodal Large Model Ovis, Providing Cooking Steps by Analyzing Dishes

AIbase基地

Published inAI News · 5 min read · Sep 20, 2024

359

At a recent press conference, Alibaba's international AI team unveiled their latest multi-modal large model, Ovis. This innovative AI technology undoubtedly brings new opportunities to various industries. Ovis boasts powerful image understanding and data processing capabilities, offering a refreshing experience.

Ovis's multi-modal capabilities are extremely robust; it can handle text, images, and various other data types, demonstrating outstanding comprehensive strength. Compared to traditional large language models, Ovis not only understands text but also performs in-depth analysis of non-text information such as images.

For instance, users need only upload a photo of a dish, and Ovis can quickly identify it and provide detailed cooking instructions, helping users easily prepare delicious meals.

Ovis can provide recipes through image recognition and processing.

According to data from the multi-modal evaluation platform OpenCompass, Ovis1.6-Gemma2-9B ranks first in comprehensive evaluations among models with parameters below 30B, surpassing a series of excellent models like MiniCPM-V-2.6. This achievement demonstrates Ovis's competitiveness in the market.

Ovis's evaluation data on OpenCompass.

Additionally, Ovis excels in areas such as mathematical reasoning, object recognition, and complex decision-making. For example, it can accurately solve math problems, identify flower species, and even translate handwritten text with no less proficiency. Among Ovis's five core advantages, its innovative architecture design and high-resolution image processing capabilities stand out, significantly enhancing its performance in multi-modal tasks.

Ovis's open-source strategy is also commendable. It uses the Apache2.0 license, meaning users can freely use and improve the model. All Ovis series models and code are publicly available on GitHub, allowing developers to easily access and further develop them.

In wide-ranging application scenarios such as autonomous driving, medical diagnosis, and video content understanding, the multi-modal large model Ovis demonstrates significant potential. Alibaba's international team reveals that, according to recent six-month data, the demand for AI among businesses continues to grow, with usage doubling every two months on average. Ovis will undoubtedly help more businesses enhance their operational efficiency.

Key Points:

1️⃣ Ovis is a multi-modal large model capable of handling various data types including text and images, showcasing excellent comprehensive abilities.

2️⃣ Ovis1.6-Gemma2-9B ranks first in comprehensive evaluations on OpenCompass among models with parameters below 30B, outperforming several top competitors.

3️⃣ Ovis adopts the Apache2.0 open-source license, with all models and code publicly available on GitHub, allowing developers to freely use and improve upon them.

Multimodal Large Model Ovis AI Technology Image Understanding

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Tencent Open Sources Hy-MT Translation Model: 440MB Achieves Offline Operation, Performance Exceeds Google Translate

Tencent recently open-sourced the compact AI translation model Hy-MT1.5-1.8B-1.25bit, supporting 33 languages, 5 dialects, and 1056 translation directions, fully offline on smartphones. Using 1.25-bit quantization, it aggressively compresses from 3.3GB while maintaining high performance, having won 30 international machine translation competition championships.....

May 2, 2026

420

Breaking Away from General Model Dependencies: Microsoft Embarks on a New Paradigm for Vertical Domain AI Assistants with Custom Algorithms

Microsoft launched 'Legal Assistant,' an AI tool for legal professionals integrated into Word, enabling automatic contract review, risk and obligation marking, and cross-version comparison to streamline contract processing.....

May 2, 2026

390

Top Four Tech Giants Increase AI Capital Expenditure to $725 Billion in 2026, a 77% Surge YoY

Google, Amazon, Microsoft, and Meta plan to invest $725 billion in the AI sector in 2026, marking a 77% increase from $410 billion in the previous year, far exceeding earlier estimates. The capital expenditure of the four companies reached $130 billion in just one quarter, indicating an intensifying competition in AI infrastructure. Microsoft is expected to spend $190 billion, with a doubling of its expenditure.

May 2, 2026

590

DingTalk Launches AI Audio Hardware DingTalk A1Pro: Price 1299 CNY, Supports Reverse Phone Charging

DingTalk launches the new AI hardware product DingTalk A1Pro, priced at 1299 CNY. It is positioned as a professional AI audio card, specifically designed for frequent business travelers. The device has a thickness of only 6.4mm, supports magnetic attachment and touchscreen, and is equipped with a professional-grade MEMS directional microphone. It features the "AI Office + Emergency Power Supply" integrated functions, expanding the boundaries of DingTalk's integrated software and hardware services.

Apr 30, 2026

400

Betting on People Rather than Code: The Zig Project's Strict Policy Prohibiting LLM-Assisted Contributions Sparks Debate

As Generative AI sweeps through the programming field, the Zig open-source project has introduced a strict policy in the opposite direction: completely prohibiting the use of code or comments generated by large language models for contributions. After Simon Willison's interpretation, it sparked a discussion within the community about the trade-off between technical efficiency and talent development. The core conflict lies in the choice between code production and talent growth. The Zig maintainers redefined 'contributions,' emphasizing originality and the learning process.

Apr 30, 2026

460

Kuaishou Launches KroWork: AI Desktop Assistant to Help You Work Efficiently

Kuaishou launches AI desktop agent KroWork for non-technical users, enabling file processing, browser automation, and app generation via natural language. It allows users to convert repetitive tasks into local apps for free, with all operations in a secure sandbox and no data uploaded to the cloud, ensuring privacy.....

Apr 30, 2026

470

AI Daily: DeepSeek Image Recognition Mode Beta Test; Xiaohongshu Establishes AI Primary Department; Alibaba Launches Programmer Digital Avatar QoderWake

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present the latest content in the AI field for you, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. DeepSeek has launched a beta test for its image recognition mode, officially implementing multimodal visual understanding capabilities. After the release of DeepSeek-V4, DeepSeek quickly launched the multimodal image recognition function.

Apr 30, 2026

450

From Lab to Life: Several Cutting-Edge Technologies of iFLYTEK Shine in Fuzhou

The 9th Digital China Construction Summit opened in Fuzhou on April 28. iFLYTEK became the focus of the exhibition, showcasing the transformation of AI from 'showy' to practical applications. Its exhibits cover various scenarios such as office assistants and embodied intelligent robots, reflecting the extensive penetration and application of artificial intelligence technology in daily life.

Apr 30, 2026

240

Jurylu Announces Deep Collaboration with Volcano Engine, AI Short Plays Enter the Industrialization Era

Hangzhou Julilu Technology partners with Volcano Engine to integrate the Doubao video generation model Seedance 2.0, shifting AI drama production from manual workflows to industrialized processes. The core breakthrough lies in dual improvements in efficiency and quality, achieving qualitative leaps in key filmmaking metrics through the integration of Volcano Engine models and cloud infrastructure.....

Apr 30, 2026

310

Hongguo Short Plays Launch Comprehensive Cleanup of Over 10,000 Low-Quality AI Plays to Standardize Content and Improve Quality

Hongguo short drama platform recently conducted a large-scale cleanup of low-quality AI-generated dramas, targeting issues like vulgar content, rough visuals, chaotic plots, and extreme emotional manipulation. From April 7 to 15, 3,522 substandard dramas were removed over nine days. Chief editor Le Li emphasized ongoing efforts to improve drama quality.....

Apr 30, 2026

250

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Alibaba International Launches Latest Multimodal Large Model Ovis, Providing Cooking Steps by Analyzing Dishes

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Tencent Open Sources Hy-MT Translation Model: 440MB Achieves Offline Operation, Performance Exceeds Google Translate

Breaking Away from General Model Dependencies: Microsoft Embarks on a New Paradigm for Vertical Domain AI Assistants with Custom Algorithms

Top Four Tech Giants Increase AI Capital Expenditure to $725 Billion in 2026, a 77% Surge YoY

DingTalk Launches AI Audio Hardware DingTalk A1Pro: Price 1299 CNY, Supports Reverse Phone Charging

Betting on People Rather than Code: The Zig Project's Strict Policy Prohibiting LLM-Assisted Contributions Sparks Debate

Kuaishou Launches KroWork: AI Desktop Assistant to Help You Work Efficiently

AI Daily: DeepSeek Image Recognition Mode Beta Test; Xiaohongshu Establishes AI Primary Department; Alibaba Launches Programmer Digital Avatar QoderWake

From Lab to Life: Several Cutting-Edge Technologies of iFLYTEK Shine in Fuzhou

Jurylu Announces Deep Collaboration with Volcano Engine, AI Short Plays Enter the Industrialization Era

Hongguo Short Plays Launch Comprehensive Cleanup of Over 10,000 Low-Quality AI Plays to Standardize Content and Improve Quality

AI News Recommendations

Tencent Open Sources Hy-MT Translation Model: 440MB Achieves Offline Operation, Performance Exceeds Google Translate

Breaking Away from General Model Dependencies: Microsoft Embarks on a New Paradigm for Vertical Domain AI Assistants with Custom Algorithms

Top Four Tech Giants Increase AI Capital Expenditure to $725 Billion in 2026, a 77% Surge YoY

DingTalk Launches AI Audio Hardware DingTalk A1Pro: Price 1299 CNY, Supports Reverse Phone Charging

Betting on People Rather than Code: The Zig Project's Strict Policy Prohibiting LLM-Assisted Contributions Sparks Debate

Kuaishou Launches KroWork: AI Desktop Assistant to Help You Work Efficiently

AI Daily: DeepSeek Image Recognition Mode Beta Test; Xiaohongshu Establishes AI Primary Department; Alibaba Launches Programmer Digital Avatar QoderWake

From Lab to Life: Several Cutting-Edge Technologies of iFLYTEK Shine in Fuzhou

Jurylu Announces Deep Collaboration with Volcano Engine, AI Short Plays Enter the Industrialization Era

Hongguo Short Plays Launch Comprehensive Cleanup of Over 10,000 Low-Quality AI Plays to Standardize Content and Improve Quality