InternViT-6B-448px-V2_5
An enhanced visual model based on InternViT-6B-448px-V1-5
CommonProductImageVisual ModelFeature Extraction
InternViT-6B-448px-V2_5 is a visual model built upon InternViT-6B-448px-V1-5, which enhances the visual encoder's ability to extract visual features by utilizing ViT incremental learning and NTP loss (Stage 1.5). It particularly excels in domains where representation is lacking in large-scale network datasets, such as multilingual OCR data and mathematical charts. This model is part of the InternVL 2.5 series, maintaining the same 'ViT-MLP-LLM' architecture as its predecessor, while integrating the newly incrementally pretrained InternViT alongside various pretrained LLMs, including InternLM 2.5 and Qwen 2.5, utilizing a randomly initialized MLP projector.
InternViT-6B-448px-V2_5 Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57