jina-clip-v2

A multilingual multimodal embedding model for text and image retrieval.

CommonProductProductivityMultimodalMultilingual
Jina-clip-v2 is a multilingual multimodal embedding model developed by Jina AI, supporting image retrieval in 89 languages, capable of processing images at a resolution of 512x512. It offers output dimensions ranging from 64 to 1024 to meet diverse storage and processing needs. The model combines the powerful text encoder Jina-XLM-RoBERTa and the visual encoder EVA02-L14, creating aligned representations of images and texts through joint training. Jina-clip-v2 excels in multimodal search and retrieval, especially in breaking language barriers and providing cross-modal understanding.
Visit

jina-clip-v2 Visit Over Time

Monthly Visits

20899836

Bounce Rate

46.04%

Page per Visit

5.2

Visit Duration

00:04:57

jina-clip-v2 Visit Trend

jina-clip-v2 Visit Geography

jina-clip-v2 Traffic Sources

jina-clip-v2 Alternatives