Leading global AI company Nvidia has recently open-sourced two new large-scale models: Nemotron-4-Minitron-4B and Nemotron-4-Minitron-8B. The open-sourcing of these models represents not only a leap forward in technology but also ignites an efficiency revolution in the AI field.

Traditional large AI model training requires extensive data and computational power. However, Nvidia has significantly reduced this demand by employing efficient training methods such as structured pruning and knowledge distillation. Specifically, compared to training from scratch, the new models require 40 times less training token data and save 1.8 times the computational cost. This achievement stems from Nvidia's deep optimization of the existing Llama-3.18B model.

image.png

Structured pruning is a neural network compression technique that simplifies the model structure by removing unimportant weights. Unlike random pruning, structured pruning preserves the structure of the weight matrices, making the pruned model more suitable for efficient operation on hardware like GPUs and TPUs by removing entire neurons or attention heads.

Knowledge distillation is a method that improves performance by having a student model mimic a teacher model. In Nvidia's practice, through logit-based knowledge distillation, the student model can learn the deep insights of the teacher model, maintaining excellent performance even with significantly reduced training data.

The Minitron-4B and Minitron-8B models trained with structured pruning and knowledge distillation have seen a 16% improvement in scores on the MMLU, rivaling well-known models like Mistral7B, Gemma7B, and Llama-38B in performance. This result validates the effectiveness of Nvidia's approach and offers new possibilities for the training and deployment of large AI models.

Nvidia's open-source initiative not only showcases its leadership in AI technology but also brings valuable resources to the AI community. As AI technology continues to advance, we look forward to seeing more innovative methods that drive AI towards greater efficiency and intelligence.

Model addresses:

https://huggingface.co/nvidia/Nemotron-4-Minitron-4B-Base

https://huggingface.co/nvidia/Nemotron-4-Minitron-8B-Base