"MiniCPM-V2.6", an edge-side multimodal artificial intelligence model, boasts only 8 billion parameters yet has achieved three SOTA (State of the Art) scores in single-image, multi-image, and video understanding tasks below 20 billion parameters, significantly enhancing multimodal capabilities at the edge, and aligning comprehensively with GPT-4V levels.

WeChat Screenshot_20240807080523.png

Here is a summary of its features:

  1. Model Characteristics: MiniCPM-V2.6 has achieved comprehensive superiority in core capabilities such as single-image, multi-image, and video understanding on the edge, and for the first time, brought real-time video understanding and multi-image joint understanding functions to the edge, getting closer to complex real-world scenarios.

  2. Efficiency and Performance: The model, with a small footprint, boasts extremely high pixel density (Token Density), twice that of GPT-4o's single-token encoding pixel density, achieving extremely high operational efficiency on edge devices.

  3. Edge Friendliness: The model, after quantization, requires only 6GB of memory, with an edge inference speed of up to 18 tokens per second, 33% faster than its predecessor, and supports multiple languages and inference frameworks.

  4. Functional Expansion: MiniCPM-V2.6 extends high-definition image parsing capabilities from single-image to multi-image and video scenarios through OCR capabilities, reducing the number of visual tokens and saving resources.

  5. Inference Capabilities: It demonstrates excellent capabilities in multi-image understanding and complex reasoning tasks, such as step-by-step instructions for adjusting a bicycle seat, and recognition of the underlying points in meme images.

  6. Multi-image ICL: The model supports contextual few-shot learning, quickly adapting to specific domain tasks and improving output stability.

  7. High-definition Visual Architecture: Through a unified visual architecture, the model's OCR capabilities are sustained, enabling smooth expansion from single-image to multi-image and video.

  8. Ultra-low Hallucination Rate: MiniCPM-V2.6 performs excellently in hallucination assessments, demonstrating its reliability.

The introduction of the MiniCPM-V2.6 model is of significant importance for the development of edge AI. It not only enhances multimodal processing capabilities but also showcases the possibility of achieving high-performance AI on resource-constrained edge devices.

MiniCPM-V2.6 Open Source Address:

GitHub: 

 https://github.com/OpenBMB/MiniCPM-V

HuggingFace:

https://huggingface.co/openbmb/MiniCPM-V-2_6

 llama.cpp, ollama, vllm Deployment Tutorial Address:

https://modelbest.feishu.cn/docx/Duptdntfro2Clfx2DzuczHxAnhc

MiniCPM Series Open Source Address:

https://github.com/OpenBMB/MiniCPM