Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

Meta Releases Open-Source Llama 3.1 405B Language Model Comparable to GPT-4

AIbase基地

Published inAI News · 4 min read · Jul 24, 2024

156

Last night, Meta announced the open-source release of its latest large language model, Llama3.1 405B. This significant announcement marks the culmination of a year-long meticulous preparation, from project planning to final review, as the Llama3 series models are finally unveiled to the public.

Llama3.1 405B is a multilingual tool usage model with 128 billion parameters. After pre-training with an 8K context length, the model underwent further continuous training with a 128K context length. According to Meta, this model performs on par with the industry-leading GPT-4 in multiple tasks.

Compared to previous Llama models, Meta has made optimizations in several areas:

Improved the pre-processing and curation process for pre-training data
Enhanced the quality assurance and filtering methods for post-training data

The pre-training of the 405B model was a significant challenge, involving 15.6 trillion tokens and 3.8x10^25 floating-point operations. To address this, Meta optimized the entire training architecture and utilized over 16,000 H100 GPUs.

To support large-scale production inference for the 405B model, Meta reduced it from 16-bit (BF16) to 8-bit (FP8) quantization, significantly lowering computational requirements and enabling the model to run on a single server node.

Additionally, Meta leveraged the 405B model to enhance the post-training quality of the 70B and 8B models. During the post-training phase, the team refined the chat models through multiple rounds of alignment processes, including supervised fine-tuning (SFT), rejection sampling, and direct preference optimization. Notably, most SFT samples were generated using synthetic data.

Llama3 also integrates image, video, and voice capabilities, employing a combined approach to enable the model to recognize images and videos and support voice interactions. However, these features are still under development and have not been officially released.

Meta has also updated the licensing agreement, allowing developers to use the outputs of the Llama models to improve other models.

Meta's researchers stated, "It is incredibly exciting to work on the forefront of AI alongside top industry talent and to publish our research transparently and openly. We look forward to seeing the innovation brought about by open-source models and the potential of future Llama series models!"

This open-source initiative is undoubtedly set to bring new opportunities and challenges to the AI field, propelling the advancement of large language model technology.

"Llama3.1"

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: JD Logistics Launches Super Brain Large Model 2.0; DeepSeek V3.1 Final Version Released; Kimi Launches New Agent Mode

JD Logistics unveiled 'Super Brain Model 2.0' and 'Wolf Robotics Arm System' at JDDiscovery-2025, advancing logistics intelligence. AI Daily delivers cutting-edge tech trends for developers.....

Sep 26, 2025

260

DeepSeek V3.1 Final Version Released: Fixing Critical Vulnerabilities and Improving Stability, V4 New Architecture is About to Come Out

DeepSeek V3.1-Terminus update enhances stability, fixes output anomalies, and optimizes language processing. The version name suggests a major transition, with focus on system reliability improvements.....

Sep 26, 2025

200

DeepSeek-V3.1-Terminus Launches with Significant Performance Improvements and Enhanced Deep Reasoning Capabilities

DeepSeek released open-source model DeepSeek-V3.1-Terminus, fixing language inconsistencies and abnormal characters while enhancing programming and search agent performance. Benchmarks show superior performance in non-agent tasks.....

Sep 23, 2025

220

Jiemeng AI Officially Launches on Volcano Engine, Opening API Services for Enterprises

On Sept 3, Meng AI launched on VolcEngine, offering APIs for image/video/digital human generation, featuring optimized models for enterprise applications.....

Sep 3, 2025

310

DeepSeek V3.1 Exposed with Ji Character Bug: API Calls Show Mysterious Characters, Official Has Responded

DeepSeek V3.1 model's '极 character bug' randomly inserts '极' in API outputs, sparking developer discussions. Initially found on platforms like Volc Engine and chutes, it spread to Tencent CodeBuddy and DeepSeek's official services, drawing significant domestic attention.....

Aug 27, 2025

270

New Breakthrough in Domestic AI! DeepSeek-V3.1 Large Model Released

Ping An Securities reports DeepSeek-V3.1's launch with new UE8M0FP8Scale precision, boosting tool efficiency and AI task performance. This innovation advances domestic chips and AI collaboration.....

Aug 26, 2025

110

QQ Browser Integrates DeepSeek-V3.1

QQ Browser integrates DeepSeek-V3.1, enhancing download efficiency for files, software, and videos via its 'AI Download Assistant'.....

Aug 24, 2025

200

Tencent Yuanbao Integrates with DeepSeek V3.1 to Open a New Smart Experience

Tencent Yuanbao integrates DeepSeek V3.1, offering upgraded features via PC/web. Key improvements include faster response times with DeepSeek V3.1-Think and overall performance optimization, marking Tencent's AI advancement.....

Aug 22, 2025

330

Tencent Cloud Launches DeepSeek-V3.1 API

Tencent Cloud launched DeepSeek-V3.1 on Aug 22, simplifying AI agent development via its platforms, improving stability and quality. This update enhances AI accessibility for businesses and developers via API.....

Aug 22, 2025

140

Tencent CodeBuddy IDE Domestic Version Enters Public Beta with Integration of DeepSeek V3.1

Tencent's CodeBuddy IDE enters beta in China, integrating DeepSeek-V3.1 as the first AI workstation supporting this model, sparking developer interest.....

Aug 22, 2025

150

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Meta Releases Open-Source Llama 3.1 405B Language Model Comparable to GPT-4

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: JD Logistics Launches Super Brain Large Model 2.0; DeepSeek V3.1 Final Version Released; Kimi Launches New Agent Mode

DeepSeek V3.1 Final Version Released: Fixing Critical Vulnerabilities and Improving Stability, V4 New Architecture is About to Come Out

DeepSeek-V3.1-Terminus Launches with Significant Performance Improvements and Enhanced Deep Reasoning Capabilities

Jiemeng AI Officially Launches on Volcano Engine, Opening API Services for Enterprises

DeepSeek V3.1 Exposed with Ji Character Bug: API Calls Show Mysterious Characters, Official Has Responded

New Breakthrough in Domestic AI! DeepSeek-V3.1 Large Model Released

QQ Browser Integrates DeepSeek-V3.1

Tencent Yuanbao Integrates with DeepSeek V3.1 to Open a New Smart Experience

Tencent Cloud Launches DeepSeek-V3.1 API

Tencent CodeBuddy IDE Domestic Version Enters Public Beta with Integration of DeepSeek V3.1

GEO Services