Chinese artificial intelligence company DeepSeek recently launched a groundbreaking open-source large language model, DeepSeek V3. With 671 billion parameters, this model not only surpasses Meta's Llama3.1 in scale but also outperforms mainstream closed-source models, including GPT-4, in several benchmark tests.
A standout feature of DeepSeek V3 is its powerful performance combined with an efficient development process. The model excelled in competitions on the programming platform Codeforces and led competitors in the Aider Polyglot test, which assesses code integration capabilities. The model was trained on an enormous dataset of 14.8 trillion tokens, achieving a parameter scale 1.6 times that of Llama3.1.
Remarkably, DeepSeek completed the model training in just two months with a cost of $5.5 million, significantly lower than the investment typically required for similar products.
DeepSeek is backed by the Chinese quantitative hedge fund High-Flyer Capital Management. This fund has invested in the construction of a server cluster with 10,000 Nvidia A100 GPUs, valued at approximately $138 million. High-Flyer's founder, Liang Wenfeng, stated that open-source AI will ultimately break the monopoly of current closed models.
DeepSeek V3 is released under a permissive license, allowing developers to download, modify, and use it for various applications, including commercial purposes. Although running the full version still requires robust hardware support, the release of this open-source model marks a significant step forward in open innovation within the AI field.