PixelProse

A large-scale image captioning dataset providing over 16M synthetic image descriptions.

CommonProductOthersImage CaptioningVision-Language Model
PixelProse, created by the tomg-group-umd, is a large-scale dataset generating over 16 million detailed image descriptions using the advanced vision-language model Gemini 1.0 Pro Vision. This dataset is crucial for developing and improving image-to-text conversion technologies and can be used for tasks like image captioning and visual question answering.
Visit

PixelProse Visit Over Time

Monthly Visits

19075321

Bounce Rate

45.07%

Page per Visit

5.5

Visit Duration

00:05:32

PixelProse Visit Trend

PixelProse Visit Geography

PixelProse Traffic Sources

PixelProse Alternatives