Vision Mamba
An efficient framework for visual representation learning based on Bi-directional State Space Models
CommonProductImageComputer VisionDeep Learning
Vision Mamba is an efficient visual representation learning framework, constructed with a Bi-directional Mamba module, which overcomes computational and memory limitations to enable high-resolution image understanding in a Transformer-style. Independent of self-attention mechanisms, it compresses visual representations through positional embeddings and a bi-directional state space model, achieving superior performance with improved computational and memory efficiency. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, the framework demonstrates performance that outperforms classic visual Transformers such as DeiT, while enhancing computational and memory efficiency by 2.8 times and 86.8% respectively.
Vision Mamba Visit Over Time
Monthly Visits
515580771
Bounce Rate
37.20%
Page per Visit
5.8
Visit Duration
00:06:42