Chinese Team Releases the World's Largest Open-source Multimodal Dataset, Achieving Record Performance with 2B Parameter Model

AIbase基地

Published inAI News · 4 min read · Nov 6, 2024

197

Recently, research teams from several Chinese institutions have released a massive multimodal dataset named Infinity-MM, and have trained an outstanding AI model called Aquila-VL-2B based on this dataset. This breakthrough has injected new momentum into the development of multimodal AI.

The Infinity-MM dataset is astonishingly large, containing four major categories of data: 10 million image descriptions, 24.4 million general visual instruction data, 6 million high-quality selected instruction data, and 3 million data generated by AI models like GPT-4. The research team used the open-source AI model RAM++ for image analysis and information extraction, and ensured the quality and diversity of the generated data through a unique six-category classification system.

Data Analysis Data Monitoring Internet Big Data (2)

Image source note: The image was generated by AI, provided by the image authorization service Midjourney

In terms of model architecture, Aquila-VL-2B is built on LLaVA-OneVision, integrating the Qwen-2.5 language model and SigLIP image processing technology. The research team adopted a four-stage progressive training method: starting from basic image-text correlation learning, gradually transitioning to general visual tasks, specific instruction processing, and finally incorporating synthetic data, while progressively increasing the upper limit of image resolution.

Despite having only 2 billion parameters, Aquila-VL-2B performed exceptionally well in various benchmark tests. It achieved the best score of 54.9% in the multimodal understanding test MMStar, and an impressive 59% in the mathematical ability test MathVista, significantly outperforming similar systems. In general image understanding tests, the model achieved excellent scores of 43% on HallusionBench and 75.2% on MMBench.

The study found that the introduction of synthetic data significantly contributed to the model's performance improvement. Experiments showed that without these additional data, the model's performance dropped by an average of 2.4%. From the third stage, Aquila-VL-2B significantly outperformed reference models such as InternVL2-2B and Qwen2VL-2B, especially in the fourth stage, where performance improved more noticeably with increased data volume.

It is worth noting that the research team has made the dataset and model available to the research community, which will greatly promote the development of multimodal AI technology. The model was not only trained on Nvidia A100 GPUs but also supports China's self-developed chips, demonstrating strong hardware adaptability.

Chai-2 Makes a Shocking Debut: AI-Powered Zero-Shot Antibody Design, Accelerating Drug Development by Hundreds of Times

Artificial intelligence once again stirs up the field of drug development! Chai Discovery recently launched a new AI model called Chai-2, which has drawn widespread attention with its breakthrough technology in molecular design. Chai-2 achieves zero-shot antibody design with a success rate of 16%-20%, hundreds of times higher than traditional methods, shortening the drug development cycle from months or even years to just two weeks. Zero-shot antibody design breaks through traditional bottlenecks. Chai-2 is a multi-modal generative AI model developed by Chai Discovery, specifically designed for...

Breaking News! GPT-5 is About to Arrive, Take You into a New Multimodal AI Era!

Recently, news about OpenAI's upcoming release of GPT-5 has attracted widespread attention in the technology industry. According to insiders, GPT-5 has already started a gradual test and is expected to be officially launched in July this year. This new model will adopt a multimodal design, meaning it can not only process text input but also understand speech, images, code, and even videos, completely changing the way we interact with AI. Sam Altman, CEO of OpenAI, stated that the launch of GPT-5 will mark a new era in AI.

Huawei Open Sources Dense Pangu 7B and Mixture of Experts Model with 72B Parameters

On June 30, Huawei officially announced the open sourcing of the Pangu dense model with 7 billion parameters, the PanguPro MoE model with 72 billion parameters, and the model inference technology based on Ascend. This open-source initiative is a key step in Huawei's strategy to build an Ascend ecosystem, aiming to promote research and innovation in large model technology, accelerate the application of artificial intelligence across industries, and create value.

"AI Daily Report - June 27th"; Tencent open-sources lightweight Huyuan-A13B model; Keling AI launches video audio effects feature

Welcome to AIbase's [AI Daily Report]! Spend three minutes every day to learn about the latest AI news, helping you understand AI industry trends and innovative AI product applications. For more AI updates, visit: https://www.aibase.com/zh1. Tencent open-sources the lightweight Huyuan-A13B model, which can be deployed with just one mid-range GPU card. Tencent has released a new member of the Huyuan large model family, the Huyuan-A13B model, which uses a mixture of experts (MoE) architecture, with a total parameter scale of 80 billion and an activated parameter count of 13 billion, large

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Chinese Team Releases the World's Largest Open-source Multimodal Dataset, Achieving Record Performance with 2B Parameter Model

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Anthropic's Annual Revenue Has Reached $4 Billion, Growing Nearly Fourfold from the Start of the Year, Intensifying Competition with Cursor

xAI Console Adds Reference to Grok4 and Grok4Code, Marking the Upcoming Release of the Next-Generation AI Model

Chai-2 Makes a Shocking Debut: AI-Powered Zero-Shot Antibody Design, Accelerating Drug Development by Hundreds of Times

Chai Discovery Launches Chai-2 Model: Zero-shot Antibody Design Achieves 16-20% Hit Rate

New Open Source AI System OmniGen 2: Integrates Image and Text Generation Like GPT-4o

Breaking News! GPT-5 is About to Arrive, Take You into a New Multimodal AI Era!

Memory Optimization! NVIDIA DLSS 4 Makes Games Smoother, Reducing VRAM by 20% with Transformer Model

Tencent Open Sources Hunyuan-A13B: An AI Model with Small Size and Great Intelligence

Huawei Open Sources Dense Pangu 7B and Mixture of Experts Model with 72B Parameters

"AI Daily Report - June 27th"; Tencent open-sources lightweight Huyuan-A13B model; Keling AI launches video audio effects feature