Mistral-Nemo-Instruct-2407 is a large language model (LLM) jointly trained by Mistral AI and NVIDIA, which is an instruction-tuned version of Mistral-Nemo-Base-2407. The model has been trained on multilingual and code data and has significantly outperformed existing models of similar or smaller size. Its main features include: supporting multilingual and code data training, 128k context window, and can be replaced with Mistral 7B. The model architecture includes 40 layers, 5120 dimension, 128 head dimension, 1436 hidden dimension, 32 heads, 8 kv heads (GQA), 2^17 vocabulary (about 128K), rotor embedding (theta=1M). The model has performed well on various benchmarks, such as HellaSwag (0-shot), Winogrande (0-shot), OpenBookQA (0-shot) etc.