Vision Mamba

An efficient framework for visual representation learning based on Bi-directional State Space Models

CommonProductImageComputer VisionDeep Learning

Vision Mamba is an efficient visual representation learning framework, constructed with a Bi-directional Mamba module, which overcomes computational and memory limitations to enable high-resolution image understanding in a Transformer-style. Independent of self-attention mechanisms, it compresses visual representations through positional embeddings and a bi-directional state space model, achieving superior performance with improved computational and memory efficiency. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, the framework demonstrates performance that outperforms classic visual Transformers such as DeiT, while enhancing computational and memory efficiency by 2.8 times and 86.8% respectively.

Visit

Vision Mamba Visit Over Time

Monthly Visits

474564576

Bounce Rate

36.20%

Page per Visit

6.1

Visit Duration

00:06:34

Vision Mamba Visit Trend

Vision Mamba Visit Geography

Vision Mamba Traffic Sources

Vision Mamba Alternatives

Thera — An aliasing-free arbitrary-scale super-resolution method.

Productivity

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Vision Mamba

Vision Mamba Visit Over Time

Vision Mamba Visit Trend

Vision Mamba Visit Geography

Vision Mamba Traffic Sources

Vision Mamba Alternatives

Thera — An aliasing-free arbitrary-scale super-resolution method.

MIDI — Generates high-fidelity 3D scenes from a single image using a multi-instance diffusion model.

diffusion-e2e-ft — Fine-tuning tool for image-conditioned diffusion models

DUSt3R — Dense 3D reconstruction without camera calibration information

Vision Mamba — An efficient framework for visual representation learning based on Bi-directional State Space Models

UniRef++ — A unified model for image and video object segmentation

HunyuanVideo-I2V — HunyuanVideo-I2V is an image-to-video generation framework based on HunyuanVideo, launched by Tencent.

UniTok — UniTok is a unified visual tokenizer for visual generation and understanding.

VisoMaster — Powerful video replacement and editing software that utilizes AI technology for natural effects.

MatAnyone — MatAnyone is a stable video matting framework that supports target specification, suitable for complex backgrounds.

Video Depth Anything — Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

leapfusion-hunyuan-image2video — A novel image-to-video sampling technology based on the Hunyuan model, enabling high-quality video generation.

STAR — STAR is a spatio-temporal enhancement framework for real-world video super-resolution, integrating powerful text-to-video diffusion priors into real-world video super-resolution for the first time.

TryOffAnyone — Generates flat fabric models from images of dressed individuals.

StableAnimator — A high-quality portrait animation synthesis tool with identity preservation.

LLaMA-Mesh — Unified 3D Mesh Generation with Language Models

face_anon_simple — Facial anonymization technology that retains key details while effectively protecting privacy.

Watermark Anything — Image watermarking technology that can embed localized watermark information within images.

Flux.1 Lite — An 8B parameter variational autoencoder model designed for efficient text-to-image generation.

Long-LRM — Efficient 3D Gaussian reconstruction model for fast large-scale scene regeneration

PuLID-Flux ComfyUI Implementation — PuLID-Flux implementation for ComfyUI

StableDelight — Removes specular reflections to reveal hidden textures.

Colorful Diffuse Intrinsic Image Decomposition — A technique that decomposes images in natural environments into reflectance and lighting effects.

opencv_contrib — An additional module library for OpenCV, designed for the development and testing of new image processing functionalities.

Open Source Computer Vision Library — Open Source Computer Vision Library

Open-MAGVIT2 — Open-source autoregressive visual generation model project

Shangchen Zhou — A blog website focused on research and innovation in the fields of computer vision and machine learning.

AWPortrait-FL — An advanced portrait generation model based on FLUX.1-dev

Show-o — A unified transformer for multimodal understanding and generation.

SF3D — Quickly generate textured 3D models