Llama3 Compressed Version! Nvidia Releases Small Language Model Llama-3.1-Minitron4B with Only 400 Million Parameters

AIbase基地

Published inAI News · 5 min read · Aug 21, 2024

482

In an era where technology companies are racing to integrate artificial intelligence into devices, an increasing number of Small Language Models (SLM) have emerged, capable of operating on resource-constrained devices. Recently, Nvidia's research team introduced Llama-3.1-Minitron4B, a compressed version of the Llama3 model, utilizing cutting-edge model pruning and distillation techniques. This new model not only matches the performance of larger models but also competes with similarly sized small models, while being more efficient in both training and deployment.

Pruning and distillation are two key techniques for creating smaller, more efficient language models. Pruning involves removing unimportant parts of the model, including "deep pruning"—removing entire layers, and "width pruning"—removing specific elements such as neurons and attention heads. Model distillation, on the other hand, transfers knowledge and capabilities from a large model (the "teacher model") to a smaller, simpler "student model".

There are mainly two methods of distillation: the first is through "SGD training," where the student model learns from the teacher model's inputs and responses, and the second is "classical knowledge distillation," where the student model learns not only the results but also the internal activations of the teacher model.

In a previous study, Nvidia researchers successfully reduced the Nemotron15B model to an 800 million parameter model through pruning and distillation, and further refined it to a 400 million parameter model. This process not only improved performance by 16% on the renowned MMLU benchmark but also required 40 times less training data than training from scratch.

This time, Nvidia's team created a 400 million parameter model based on the Llama3.18B model using the same methods. Initially, they fine-tuned the unpruned 8B model on a dataset containing 94 billion tokens to address the distribution differences between the training data and the distillation dataset. Subsequently, they employed both deep pruning and width pruning methods, resulting in two different versions of Llama-3.1-Minitron4B.

The researchers fine-tuned the pruned models using NeMo-Aligner and evaluated their capabilities in instruction following, role-playing, retrieval-augmented generation (RAG), and function calling.

The results showed that despite the smaller amount of training data, Llama-3.1-Minitron4B performed comparably to other small models, demonstrating excellent performance. The width pruning version of the model has been released on Hugging Face, allowing commercial use, enabling more users and developers to benefit from its efficiency and outstanding performance.

Official Blog: https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/

Key Points:
🌟 Llama-3.1-Minitron4B is Nvidia's small language model introduced based on pruning and distillation techniques, featuring efficient training and deployment capabilities.
📈 The model used 40 times less tokens in training compared to training from scratch, yet showed significant performance improvements.
🔓 The width pruning version has been released on Hugging Face, facilitating commercial use and development for users.

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

1. Zhipu launches free AI Slides for PPT generation. 2. Keling AI introduces KeTu 2.1 with 180 styles. 3. NVIDIA's DiffusionRenderer enables 3D scene editing. 4. Modao AI offers 30-second prototype generation. 5. Higgsfield creates avatars from 10 photos. 6. Google open-sources GenAI Processors. 7. Google Veo3 adds image-to-video. 8. Mistral AI releases Devstral2507 for code generation.....

Mistral AI Releases Devstral2507: Designed for Code-Centric Language Modeling

Mistral AI launched the Devstral2507 series with two AI models: the open-source Devstral Small1.1 (24 billion parameters, SWE-Bench score of 53.6%) and the enterprise version Devstral Medium2507 (score of 61.6%). Small1.1 supports a 128k context window and local deployment, while Medium2507 outperforms some commercial models. Both are optimized for code reasoning and program synthesis, and support integration with agent frameworks.

Musk's New AI Chatbot Grok 4: Pursuing Truth or Advocating Personal Opinions?

Musk's xAI launched Grok4 AI chatbot, promoting 'truth-seeking' but sparking controversy. Tests show it often cites Musk's views on sensitive topics like Israel-Palestine conflict and immigration. Grok previously faced anti-Semitic content issues, highlighting risks of linking AI to founder's opinions. While Grok4 outperforms rivals in some tests, frequent errors and lack of transparency may hinder commercialization. xAI is promoting $300/month s....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Llama3 Compressed Version! Nvidia Releases Small Language Model Llama-3.1-Minitron4B with Only 400 Million Parameters

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Tencent Hunyuan-A13B Model API Launches

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

Mistral AI Releases Devstral2507: Designed for Code-Centric Language Modeling

NVIDIA's market value exceeds $4 trillion for the first time, Huang Renxun's meeting with Trump draws attention

Microsoft BioEmu Model Dramatically Shortens Protein Simulation Time

Llama Is Abandoned! Meta Shifts to Claude, Insider Secrets Revealed

Musk's New AI Chatbot Grok 4: Pursuing Truth or Advocating Personal Opinions?

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Personification of Large AI Models: Grok 4 and Empathy with Musk?