Zamba2-mini is a small language model released by Zyphra Technologies Inc., specifically designed for edge applications. It achieves evaluation scores and performance comparable to larger models while maintaining a minimal memory footprint (<700MB). Featuring 4-bit quantization technology, it offers a 7x reduction in parameters while retaining the same performance characteristics. Zamba2-mini excels in inference efficiency, boasting faster first-token generation times, lower memory overhead, and reduced generation latency compared to larger models like Phi3-3.8B. Furthermore, the model weights have been open-sourced (Apache 2.0), enabling researchers, developers, and companies to leverage its capabilities and push the boundaries of efficient foundational models.