Mistral-Nemo-Base-2407

12B parameter large language model

CommonProductProgramming[\large language model\\text generation\
Mistral-Nemo-Base-2407 is a 12B parameter large language model pre-trained by Mistral AI and NVIDIA. This model has been trained on multi-language and code data, and is significantly better than the existing models of the same or smaller scale. Its main features include: released under Apache 2.0 license, supporting pre-training and instruction version, 128k context window training, supporting multiple languages and code data, alternative product of Mistral 7B. The model architecture includes 40 layers, 5120-dimensional, 128 head-dimensional, 14364 hidden-dimensional, 32-head number, 8 kv-head (GQA), vocabulary size of about 128k, and rotative-embedding (θ=1M). The model performs well on multiple benchmarks, such as HellaSwag, Winogrande, OpenBookQA, etc.
Visit

Mistral-Nemo-Base-2407 Visit Over Time

Monthly Visits

19075321

Bounce Rate

45.07%

Page per Visit

5.5

Visit Duration

00:05:32

Mistral-Nemo-Base-2407 Visit Trend

Mistral-Nemo-Base-2407 Visit Geography

Mistral-Nemo-Base-2407 Traffic Sources

Mistral-Nemo-Base-2407 Alternatives