vit-gpt2-image-captioning
PublicFine-tuning an encoder-decoder transformer (ViT-Base-Patch16-224-In21k and DistilGPT2) for image captioning on the COCO dataset
Fine-tuning an encoder-decoder transformer (ViT-Base-Patch16-224-In21k and DistilGPT2) for image captioning on the COCO dataset