DeepSeek-V3
A Mixture-of-Experts language model with 671 billion parameters.
ChineseSelectionProductivityNatural Language ProcessingDeep Learning
DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model featuring a total of 671 billion parameters, activating 37 billion parameters at a time. It utilizes the Multi-head Latent Attention (MLA) and DeepSeekMoE architecture, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 introduces a novel load balancing strategy without auxiliary losses and establishes multiple-token prediction training objectives for enhanced performance. It has been pre-trained on 14.8 trillion high-quality tokens, followed by supervised fine-tuning and reinforcement learning stages to fully leverage its capabilities. Comprehensive evaluations demonstrate that DeepSeek-V3 outperforms other open-source models, achieving performance on par with leading proprietary models. Despite its outstanding performance, the complete training process of DeepSeek-V3 requires only 2.788 million H800 GPU hours, with a highly stable training environment.
DeepSeek-V3 Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29