Google Vision Transformer
An image recognition model based on the Transformer architecture
CommonProductImageArtificial IntelligenceImage Recognition
Google Vision Transformer is an image recognition model based on the Transformer encoder. It is pre-trained on a large-scale image dataset and can be used for tasks such as image classification. The model is pre-trained on the ImageNet-21k dataset and fine-tuned on the ImageNet dataset, possessing strong image feature extraction capabilities. The model processes image data by dividing the image into fixed-size image blocks and linearly embedding these blocks. Additionally, the model incorporates positional encoding before the input sequence to handle sequential data within the Transformer encoder. Users can perform image classification and other tasks by adding a linear layer on top of the pre-trained encoder. The advantages of Google Vision Transformer lie in its powerful image feature learning ability and widespread applicability. The model is freely available for use.
Google Vision Transformer Visit Over Time
Monthly Visits
515580771
Bounce Rate
37.20%
Page per Visit
5.8
Visit Duration
00:06:42