NVIDIA has open-sourced two new large models, Nemotron-4-Minitron-4B and Nemotron-4-Minitron-8B, which utilize structured pruning and knowledge distillation for efficient training. This significantly reduces training requirements, minimizing data and computing power consumption. Compared to traditional methods, the new models reduce training token data by 40 times and save 1.8 times in computing costs. By optimizing Llama-3.18B, structured pruning simplifies the model structure, while knowledge distillation enhances performance.