Mistral-Nemo-Instruct-2407

Large language model, supports multilingual and code data

CommonProductProgrammingLarge language modelMultilingual support
Mistral-Nemo-Instruct-2407 is a large language model (LLM) jointly trained by Mistral AI and NVIDIA, which is an instruction-tuned version of Mistral-Nemo-Base-2407. The model has been trained on multilingual and code data and has significantly outperformed existing models of similar or smaller size. Its main features include: supporting multilingual and code data training, 128k context window, and can be replaced with Mistral 7B. The model architecture includes 40 layers, 5120 dimension, 128 head dimension, 1436 hidden dimension, 32 heads, 8 kv heads (GQA), 2^17 vocabulary (about 128K), rotor embedding (theta=1M). The model has performed well on various benchmarks, such as HellaSwag (0-shot), Winogrande (0-shot), OpenBookQA (0-shot) etc.
Visit

Mistral-Nemo-Instruct-2407 Visit Over Time

Monthly Visits

20899836

Bounce Rate

46.04%

Page per Visit

5.2

Visit Duration

00:04:57

Mistral-Nemo-Instruct-2407 Visit Trend

Mistral-Nemo-Instruct-2407 Visit Geography

Mistral-Nemo-Instruct-2407 Traffic Sources

Mistral-Nemo-Instruct-2407 Alternatives