InternViT-300M-448px-V2_5
An enhanced version based on InternViT-300M-448px, improving the ability to extract visual features.
CommonProductImageVisual Feature ExtractionMultimodal Learning
InternViT-300M-448px-V2_5 is an enhanced version of InternViT-300M-448px, utilizing incremental learning with ViT and NTP loss (Stage 1.5) to enhance the visual encoder's capability to extract visual features. It is particularly effective in underrepresented domains in large-scale network datasets, such as multilingual OCR data and mathematical graphs. This model is part of the InternViT 2.5 series and retains the same 'ViT-MLP-LLM' architecture as its predecessors while integrating incrementally pre-trained InternViT with various pre-trained LLMs, such as InternLM 2.5 and Qwen 2.5, using randomly initialized MLP projectors.
InternViT-300M-448px-V2_5 Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57