PixelLLM

Pixel-Aligned Language Model

CommonProductImageImage LocalizationLanguage Model
PixelLLM is a vision-language model for image localization tasks. It can generate descriptive text based on an input location and also generate pixel coordinates for dense localization based on input text. Pre-trained on the Localized Narrative dataset, the model has learned the alignment between words and image pixels. PixelLLM can be applied to a variety of image localization tasks, including instruction following localization, location-conditioned descriptions, and dense object descriptions, and has achieved state-of-the-art performance on datasets such as RefCOCO and Visual Genome.
Visit

PixelLLM Visit Over Time

Monthly Visits

1462

Bounce Rate

37.07%

Page per Visit

2.3

Visit Duration

00:00:59

PixelLLM Visit Trend

PixelLLM Visit Geography

PixelLLM Traffic Sources

PixelLLM Alternatives