MiniGemini

A multimodal large language model capable of understanding and generating images

CommonProductProgrammingMultimodalVisual Language Model
Mini-Gemini is a multimodal visual language model supporting a series of dense and MoE large language models ranging from 2B to 34B. It possesses capabilities for image understanding, reasoning, and generation. Based on LLaVA, it utilizes dual vision encoders to provide low-resolution visual embeddings and high-resolution candidate regions. It employs patch-level information mining to perform patch-level mining between high-resolution regions and low-resolution visual queries, fusing text and images for understanding and generation tasks. It supports multiple visual understanding benchmark tests, including COCO, GQA, OCR-VQA, and VisualGenome.
Visit

MiniGemini Visit Over Time

Monthly Visits

494

Bounce Rate

60.69%

Page per Visit

1.0

Visit Duration

00:00:00

MiniGemini Visit Trend

MiniGemini Visit Geography

MiniGemini Traffic Sources

MiniGemini Alternatives