Vision Mamba is an efficient visual representation learning framework, constructed with a Bi-directional Mamba module, which overcomes computational and memory limitations to enable high-resolution image understanding in a Transformer-style. Independent of self-attention mechanisms, it compresses visual representations through positional embeddings and a bi-directional state space model, achieving superior performance with improved computational and memory efficiency. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, the framework demonstrates performance that outperforms classic visual Transformers such as DeiT, while enhancing computational and memory efficiency by 2.8 times and 86.8% respectively.