DeepSeek-V2-Chat
An efficient and economic language model with powerful mixed expert characteristics.
CommonProductProgrammingLanguage ModelMixed Expert
DeepSeek-V2 is a mixed expert (MoE) language model consisting of 236B parameters, activated with 21B parameters per token. While maintaining cost-efficient training and efficient inference, it activates each token with 21B parameters. Compared to the previous DeepSeek 67B, DeepSeek-V2 offers superior performance while saving 42.5% of training costs, reducing 93.3% of KV cache, and increasing the maximum generation throughput by 5.76 times. The model has been pretrained on an 8.1 trillion token high-quality corpus and further optimized through supervised fine-tuning (SFT) and reinforcement learning (RL), performing exceptionally well in standard benchmark tests and open-source generation evaluations.
DeepSeek-V2-Chat Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57