VMamba is a visual state-space model that combines the advantages of convolutional neural networks (CNNs) and visual Transformers (ViTs), achieving linear complexity without sacrificing global perception. It introduces the Cross-Scan Module (CSM) to address the issue of direction sensitivity and can demonstrate excellent performance in various visual perception tasks. As the image resolution increases, it shows more significant advantages compared to existing benchmark models.