InternVL2-8B-MPO

Multimodal large language model, enhancing multimodal inference capabilities.

CommonProductProductivitymultimodallarge language model

InternVL2-8B-MPO is a multimodal large language model (MLLM) that enhances multimodal inference capabilities by introducing a Mixed Preference Optimization (MPO) process. The model features an automated pipeline for preference data construction and builds the MMPR, a large-scale multimodal inference preference dataset. Based on the InternVL2-8B model, InternVL2-8B-MPO is fine-tuned using the MMPR dataset, demonstrating stronger multimodal inference capabilities with fewer hallucinations. The model achieved an accuracy of 67.0% on MathVista, surpassing the InternVL2-8B by 8.7 points, and performing closely to the much larger InternVL2-76B model.

Visit

InternVL2-8B-MPO Visit Over Time

Monthly Visits

25633376

Bounce Rate

44.05%

Page per Visit

5.8

Visit Duration

00:04:53

InternVL2-8B-MPO Visit Trend

InternVL2-8B-MPO Visit Geography

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

InternVL2-8B-MPO

InternVL2-8B-MPO Visit Over Time

InternVL2-8B-MPO Visit Trend

InternVL2-8B-MPO Visit Geography

InternVL2-8B-MPO Traffic Sources

InternVL2-8B-MPO Alternatives

MNN Large Model Android App — A fully functional Android app supporting multimodal capabilities with a large language model.

InternVL2-8B-MPO — Multimodal large language model, enhancing multimodal inference capabilities.

Doubao Large Model — A large model developed by ByteDance, providing multimodal capabilities.

Pixtral-Large-Instruct-2411 — A 124B-parameter multimodal large language model.

NVLM-D-72B — State-of-the-art multimodal large language model

InternVL2_5-2B-MPO — Advanced multimodal large language model

ultravox-v0_4_1-llama-3_1-8b — Multimodal speech large language model

mPLUG-DocOwl — A modular multimodal large language model for document understanding

ultravox-v0_4_1-llama-3_1-70b — Multimodal speech large language model

NVLM 1.0 — Cutting-edge multimodal large language model

OpenCompass 2.0 Large Language Model Leaderboard — A real-time large language model leaderboard that provides comprehensive performance assessments.

LLM Compiler-7b — An advanced large language model for code optimization and compiler inference.

MiniGemini — A multimodal large language model capable of understanding and generating images

InternVL2_5-4B-MPO — A multimodal large language model demonstrating exceptional overall performance.

mPLUG-Owl3 — A multimodal large language model that understands long image sequences.

Llama-3.2-11B-Vision — A multimodal large language model that supports image and text processing.

InternVL2_5-78B — Advanced multimodal large language model series

MinMo — MinMo is a multimodal large language model designed for seamless voice interaction.

InternVL2_5-8B-MPO — A large multimodal language model showcasing exceptional overall performance.

InternVL2_5-38B — Advanced Multimodal Large Language Model Series

InternVL 2.5 — Open-source multimodal large language model series

InternVL2_5-4B — A multimodal large language model that integrates visual and language understanding.

InternVL2_5-26B — A large multimodal language model that integrates visual and linguistic understanding.

Multimodal-Maestro — More effectively prompt large multimodal models to unlock their potential.

InternVL2_5-1B — A large multimodal language model that supports image and text understanding.

InternVL2_5-1B-MPO — A multimodal large language model that enhances integrated understanding of visual and language data.

InternVL2_5-8B-MPO-AWQ — A multimodal large language model enhancing visual and linguistic interaction capabilities.

TinyGPT-V — Efficient multimodal large language model

PowerInfer-2 — An efficient large language model inference framework designed specifically for smartphones

PowerInfer — High-speed large language model local deployment inference engine

GEO Services