InternViT-300M-448px-V2_5

An enhanced version based on InternViT-300M-448px, improving the ability to extract visual features.

CommonProductImageVisual Feature ExtractionMultimodal Learning
InternViT-300M-448px-V2_5 is an enhanced version of InternViT-300M-448px, utilizing incremental learning with ViT and NTP loss (Stage 1.5) to enhance the visual encoder's capability to extract visual features. It is particularly effective in underrepresented domains in large-scale network datasets, such as multilingual OCR data and mathematical graphs. This model is part of the InternViT 2.5 series and retains the same 'ViT-MLP-LLM' architecture as its predecessors while integrating incrementally pre-trained InternViT with various pre-trained LLMs, such as InternLM 2.5 and Qwen 2.5, using randomly initialized MLP projectors.
Visit

InternViT-300M-448px-V2_5 Visit Over Time

Monthly Visits

20899836

Bounce Rate

46.04%

Page per Visit

5.2

Visit Duration

00:04:57

InternViT-300M-448px-V2_5 Visit Trend

InternViT-300M-448px-V2_5 Visit Geography

InternViT-300M-448px-V2_5 Traffic Sources

InternViT-300M-448px-V2_5 Alternatives