A Vision Check-up
Learns string relationships between models, examines the visual world
CommonProductImageLanguage ModelVision
This paper systematically evaluates the ability of large language models (LLMs) to generate and recognize increasingly complex visual concepts, and demonstrates how to train initial visual representation learning systems using text models. Although language models cannot directly process pixel-level visual information, this research utilizes code representations of images. While LLM-generated images are not like natural images, the results on image generation and correction suggest that accurately modeling strings can teach language models much about the visual world. Furthermore, experiments on self-supervised visual representation learning using text-model generated images highlight the potential of training visual models capable of semantic evaluation on natural images using only LLMs.
A Vision Check-up Visit Over Time
Monthly Visits
21315886
Bounce Rate
45.50%
Page per Visit
5.2
Visit Duration
00:05:02