Vision Mamba

An efficient framework for visual representation learning based on Bi-directional State Space Models

CommonProductImageComputer VisionDeep Learning
Vision Mamba is an efficient visual representation learning framework, constructed with a Bi-directional Mamba module, which overcomes computational and memory limitations to enable high-resolution image understanding in a Transformer-style. Independent of self-attention mechanisms, it compresses visual representations through positional embeddings and a bi-directional state space model, achieving superior performance with improved computational and memory efficiency. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, the framework demonstrates performance that outperforms classic visual Transformers such as DeiT, while enhancing computational and memory efficiency by 2.8 times and 86.8% respectively.
Visit

Vision Mamba Visit Over Time

Monthly Visits

503747431

Bounce Rate

37.31%

Page per Visit

5.7

Visit Duration

00:06:44

Vision Mamba Visit Trend

Vision Mamba Visit Geography

Vision Mamba Traffic Sources

Vision Mamba Alternatives