Pixtral 12B is a multimodal AI model developed by the Mistral AI team. It comprehends natural images and documents, showcasing exceptional capabilities in multimodal task processing while also maintaining state-of-the-art performance in text benchmarks. The model supports various image sizes and aspect ratios and can process an arbitrary number of images within a long context window. It is an upgraded version of Mistral Nemo 12B, specifically designed for multimodal inference without sacrificing critical text processing abilities.