Last night, Meta announced the open-source release of its latest large language model, Llama3.1 405B. This significant announcement marks the culmination of a year-long meticulous preparation, from project planning to final review, as the Llama3 series models are finally unveiled to the public.

Llama3.1 405B is a multilingual tool usage model with 128 billion parameters. After pre-training with an 8K context length, the model underwent further continuous training with a 128K context length. According to Meta, this model performs on par with the industry-leading GPT-4 in multiple tasks.

QQ_1721780387467.png

Compared to previous Llama models, Meta has made optimizations in several areas:

  1. Improved the pre-processing and curation process for pre-training data
  2. Enhanced the quality assurance and filtering methods for post-training data

The pre-training of the 405B model was a significant challenge, involving 15.6 trillion tokens and 3.8x10^25 floating-point operations. To address this, Meta optimized the entire training architecture and utilized over 16,000 H100 GPUs.

To support large-scale production inference for the 405B model, Meta reduced it from 16-bit (BF16) to 8-bit (FP8) quantization, significantly lowering computational requirements and enabling the model to run on a single server node.

Additionally, Meta leveraged the 405B model to enhance the post-training quality of the 70B and 8B models. During the post-training phase, the team refined the chat models through multiple rounds of alignment processes, including supervised fine-tuning (SFT), rejection sampling, and direct preference optimization. Notably, most SFT samples were generated using synthetic data.

Llama3 also integrates image, video, and voice capabilities, employing a combined approach to enable the model to recognize images and videos and support voice interactions. However, these features are still under development and have not been officially released.

Meta has also updated the licensing agreement, allowing developers to use the outputs of the Llama models to improve other models.

Meta's researchers stated, "It is incredibly exciting to work on the forefront of AI alongside top industry talent and to publish our research transparently and openly. We look forward to seeing the innovation brought about by open-source models and the potential of future Llama series models!"

This open-source initiative is undoubtedly set to bring new opportunities and challenges to the AI field, propelling the advancement of large language model technology.