ViTMatte
Enhanced Image Segmentation with a Pretrained Pure Vision Transformer
CommonProductImageImage SegmentationVision Transformer
ViTMatte is an image segmentation system based on a pretrained pure vision transformer (Plain Vision Transformers, ViTs). It optimizes the balance between performance and computational efficiency by utilizing a hybrid attention mechanism and convolutional neck, and introduces a detail capture module to supplement the detailed information required for segmentation. ViTMatte is the first work to harness the potential of ViT in the field of image segmentation through simple adaptation, inheriting the advantages of ViT in terms of pretraining strategy, concise architecture design, and flexible inference strategy. In the Composition-1k and Distinctions-646, the most commonly used image segmentation benchmark tests, ViTMatte achieves state-of-the-art performance and surpasses previous works significantly.
ViTMatte Visit Over Time
Monthly Visits
488643166
Bounce Rate
37.28%
Page per Visit
5.7
Visit Duration
00:06:37