"MiniCPM-V2.6", an edge-side multimodal artificial intelligence model, boasts only 8 billion parameters yet has achieved three SOTA (State of the Art) scores in single-image, multi-image, and video understanding tasks below 20 billion parameters, significantly enhancing multimodal capabilities at the edge, and aligning comprehensively with GPT-4V levels.
Here is a summary of its features:
Model Characteristics: MiniCPM-V2.6 has achieved comprehensive superiority in core capabilities such as single-image, multi-image, and video understanding on the edge, and for the first time, brought real-time video understanding and multi-image joint understanding functions to the edge, getting closer to complex real-world scenarios.
Efficiency and Performance: The model, with a small footprint, boasts extremely high pixel density (Token Density), twice that of GPT-4o's single-token encoding pixel density, achieving extremely high operational efficiency on edge devices.
Edge Friendliness: The model, after quantization, requires only 6GB of memory, with an edge inference speed of up to 18 tokens per second, 33% faster than its predecessor, and supports multiple languages and inference frameworks.
Functional Expansion: MiniCPM-V2.6 extends high-definition image parsing capabilities from single-image to multi-image and video scenarios through OCR capabilities, reducing the number of visual tokens and saving resources.
Inference Capabilities: It demonstrates excellent capabilities in multi-image understanding and complex reasoning tasks, such as step-by-step instructions for adjusting a bicycle seat, and recognition of the underlying points in meme images.
Multi-image ICL: The model supports contextual few-shot learning, quickly adapting to specific domain tasks and improving output stability.
High-definition Visual Architecture: Through a unified visual architecture, the model's OCR capabilities are sustained, enabling smooth expansion from single-image to multi-image and video.
Ultra-low Hallucination Rate: MiniCPM-V2.6 performs excellently in hallucination assessments, demonstrating its reliability.
The introduction of the MiniCPM-V2.6 model is of significant importance for the development of edge AI. It not only enhances multimodal processing capabilities but also showcases the possibility of achieving high-performance AI on resource-constrained edge devices.
MiniCPM-V2.6 Open Source Address:
GitHub:
https://github.com/OpenBMB/MiniCPM-V
HuggingFace:
https://huggingface.co/openbmb/MiniCPM-V-2_6
llama.cpp, ollama, vllm Deployment Tutorial Address:
https://modelbest.feishu.cn/docx/Duptdntfro2Clfx2DzuczHxAnhc
MiniCPM Series Open Source Address:
https://github.com/OpenBMB/MiniCPM