jina-clip-v2
A multilingual multimodal embedding model for text and image retrieval.
CommonProductProductivityMultimodalMultilingual
Jina-clip-v2 is a multilingual multimodal embedding model developed by Jina AI, supporting image retrieval in 89 languages, capable of processing images at a resolution of 512x512. It offers output dimensions ranging from 64 to 1024 to meet diverse storage and processing needs. The model combines the powerful text encoder Jina-XLM-RoBERTa and the visual encoder EVA02-L14, creating aligned representations of images and texts through joint training. Jina-clip-v2 excels in multimodal search and retrieval, especially in breaking language barriers and providing cross-modal understanding.
jina-clip-v2 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32