At a recent press conference, Alibaba's international AI team unveiled their latest multi-modal large model, Ovis. This innovative AI technology undoubtedly brings new opportunities to various industries. Ovis boasts powerful image understanding and data processing capabilities, offering a refreshing experience.
Ovis's multi-modal capabilities are extremely robust; it can handle text, images, and various other data types, demonstrating outstanding comprehensive strength. Compared to traditional large language models, Ovis not only understands text but also performs in-depth analysis of non-text information such as images.
For instance, users need only upload a photo of a dish, and Ovis can quickly identify it and provide detailed cooking instructions, helping users easily prepare delicious meals.
Ovis can provide recipes through image recognition and processing.
According to data from the multi-modal evaluation platform OpenCompass, Ovis1.6-Gemma2-9B ranks first in comprehensive evaluations among models with parameters below 30B, surpassing a series of excellent models like MiniCPM-V-2.6. This achievement demonstrates Ovis's competitiveness in the market.
Ovis's evaluation data on OpenCompass.
Additionally, Ovis excels in areas such as mathematical reasoning, object recognition, and complex decision-making. For example, it can accurately solve math problems, identify flower species, and even translate handwritten text with no less proficiency. Among Ovis's five core advantages, its innovative architecture design and high-resolution image processing capabilities stand out, significantly enhancing its performance in multi-modal tasks.
Ovis's open-source strategy is also commendable. It uses the Apache2.0 license, meaning users can freely use and improve the model. All Ovis series models and code are publicly available on GitHub, allowing developers to easily access and further develop them.
In wide-ranging application scenarios such as autonomous driving, medical diagnosis, and video content understanding, the multi-modal large model Ovis demonstrates significant potential. Alibaba's international team reveals that, according to recent six-month data, the demand for AI among businesses continues to grow, with usage doubling every two months on average. Ovis will undoubtedly help more businesses enhance their operational efficiency.
Key Points:
1️⃣ Ovis is a multi-modal large model capable of handling various data types including text and images, showcasing excellent comprehensive abilities.
2️⃣ Ovis1.6-Gemma2-9B ranks first in comprehensive evaluations on OpenCompass among models with parameters below 30B, outperforming several top competitors.
3️⃣ Ovis adopts the Apache2.0 open-source license, with all models and code publicly available on GitHub, allowing developers to freely use and improve upon them.