AIbase
Product LibraryTool Navigation

clip-synthetic-captions

Public

Tiny-scale experiment showing that CLIP models trained using detailed captions generated by multimodal models (CogVLM and LLaVA 1.5) outperform models trained using the original alt-texts on a range of classification and retrieval tasks.

Creat2024-03-05T19:57:49
Update2024-03-31T02:25:46
3
Stars
0
Stars Increase