Mistral-Nemo-Base-2407
12B parameter large language model
CommonProductProgramming[\large language model\\text generation\
Mistral-Nemo-Base-2407 is a 12B parameter large language model pre-trained by Mistral AI and NVIDIA. This model has been trained on multi-language and code data, and is significantly better than the existing models of the same or smaller scale. Its main features include: released under Apache 2.0 license, supporting pre-training and instruction version, 128k context window training, supporting multiple languages and code data, alternative product of Mistral 7B. The model architecture includes 40 layers, 5120-dimensional, 128 head-dimensional, 14364 hidden-dimensional, 32-head number, 8 kv-head (GQA), vocabulary size of about 128k, and rotative-embedding (θ=1M). The model performs well on multiple benchmarks, such as HellaSwag, Winogrande, OpenBookQA, etc.
Mistral-Nemo-Base-2407 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32