VisualLanguageModel
PublicA custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.
A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.