AIM

Pre-training of Large-Scale Autoregressive Image Models

CommonProductImageVisual ModelAutoregressive Pre-training
This paper introduces AIM, a family of visual models pre-trained using autoregressive objectives. Inspired by their textual counterparts, the large language models (LLMs), these models exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of visual features improves with increasing model capacity and dataset size, and (2) the value of the objective function correlates with model performance on downstream tasks. By pre-training a 70-billion parameter AIM on 2 billion images, we achieved 84.0% accuracy on ImageNet-1k using a frozen backbone. Interestingly, even at this scale, we observe no signs of performance saturation, suggesting that AIM may represent a new frontier in training large-scale visual models. AIM's pre-training is similar to that of LLMs and does not require any image-specific strategies to stabilize large-scale training.
Visit

AIM Visit Over Time

Monthly Visits

3598093

Bounce Rate

72.95%

Page per Visit

2.0

Visit Duration

00:01:32

AIM Visit Trend

AIM Visit Geography

AIM Traffic Sources