DeepSeek-V3

A Mixture-of-Experts language model with 671 billion parameters.

ChineseSelectionProductivityNatural Language ProcessingDeep Learning
DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model featuring a total of 671 billion parameters, activating 37 billion parameters at a time. It utilizes the Multi-head Latent Attention (MLA) and DeepSeekMoE architecture, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 introduces a novel load balancing strategy without auxiliary losses and establishes multiple-token prediction training objectives for enhanced performance. It has been pre-trained on 14.8 trillion high-quality tokens, followed by supervised fine-tuning and reinforcement learning stages to fully leverage its capabilities. Comprehensive evaluations demonstrate that DeepSeek-V3 outperforms other open-source models, achieving performance on par with leading proprietary models. Despite its outstanding performance, the complete training process of DeepSeek-V3 requires only 2.788 million H800 GPU hours, with a highly stable training environment.
Visit

DeepSeek-V3 Visit Over Time

Monthly Visits

494758773

Bounce Rate

37.69%

Page per Visit

5.7

Visit Duration

00:06:29

DeepSeek-V3 Visit Trend

DeepSeek-V3 Visit Geography

DeepSeek-V3 Traffic Sources

DeepSeek-V3 Alternatives