DeepSeek quietly released its latest large language model, DeepSeek-V3-0324, causing a significant stir in the AI industry. This massive 641GB model appeared on the AI model repository Hugging Face with almost no prior announcement, continuing the company's understated yet impactful release style.
Performance Leap, Rivaling Claude Sonnet3.5
DeepSeek-V3's release is noteworthy not only for its powerful capabilities but also for its deployment method and licensing. Early testers report significant advancements across the board.
AI researcher Xeophon stated on X (formerly Twitter) that DeepSeek V3 showed a "massive leap across all metrics in all tests" internally, claiming it's now the "best non-inference model, surpassing Sonnet3.5". If this claim holds up under wider scrutiny, DeepSeek's new model would outperform Anthropic's highly-regarded commercial AI system, Claude Sonnet3.5.
Open-Source and Commercial Use: Breaking Down Paywalls
Unlike Sonnet, which requires a subscription, the model weights for DeepSeek-V3-0324 are completely free, downloadable, and usable by anyone.
Crucially, the model is licensed under the MIT License, meaning it's freely available for commercial use. This open approach stands in stark contrast to the common practice of Western AI companies placing their models behind paywalls.
MoE Architecture and Two Breakthroughs
DeepSeek V3-0324's groundbreaking architecture achieves unparalleled efficiency. The model employs a Mixture of Experts (MoE) architecture, fundamentally altering how large language models operate. Unlike traditional models that activate all parameters for every task, DeepSeek's approach activates only about 37 billion of its 685 billion parameters for a specific task. This selective activation represents a massive shift in model efficiency, achieving performance comparable to much larger, fully-activated models while drastically reducing computational demands.
Furthermore, the model incorporates two additional breakthroughs: Multi-Head Latent Attention (MLA) and Multi-Token Prediction (MTP). MLA enhances the model's ability to maintain context in long texts, while MTP allows it to generate multiple tokens per step instead of the usual one at a time. These innovations together increase output speed by nearly 80%.
Hardware-Friendly, Local Execution: Accessible on Consumer-Grade Devices
Developer tools creator Simon Willison noted in a blog post that a 4-bit quantized version reduces storage to 352GB, making it possible to run on high-end consumer hardware like a Mac Studio with an M3 Ultra chip.
AI researcher Awni Hannun wrote on social media: "The new DeepSeek-V3-0324 runs at >20 tokens/sec on a 512GB M3Ultra with mlx-lm!". While a $9,499 Mac Studio might exceed the definition of "consumer-grade hardware," running such a large model locally contrasts sharply with the latest AI models typically requiring data center-level infrastructure.
The Mac Studio consumes less than 200 watts during inference, while traditional AI infrastructure often relies on multiple Nvidia GPUs consuming thousands of watts.
Shift in Style, More Technical Focus
Early users report a noticeable shift in the model's communication style. While previous DeepSeek models were praised for their conversational, human-like tone, "V3-0324" exhibits a more formal, technically-focused style.
Some users on Reddit expressed this change, finding the new version "less human-like," missing the "human-like tone" of previous iterations. This shift likely reflects a conscious design choice by DeepSeek engineers to reposition the model for more professional and technical applications.
DeepSeek's release strategy highlights a fundamental difference in AI business philosophies between Chinese and Western companies. US leaders like OpenAI and Anthropic keep their models behind paywalls, while Chinese AI companies increasingly favor permissive open-source licensing.
This openness is rapidly transforming China's AI ecosystem, enabling startups, researchers, and developers to innovate on top of advanced AI technology without massive capital expenditures. Chinese tech giants, including Baidu, Alibaba, and Tencent, are also releasing or planning to release open-source AI models. With limited access to cutting-edge Nvidia chips, Chinese companies' focus on efficiency and optimization has become a potential competitive advantage.
DeepSeek-V3-0324's release is also seen as the foundation for its next-generation inference model, DeepSeek-R2.
Considering Nvidia CEO Jensen Huang's recent statement that DeepSeek's R1 model "consumes 100x more compute than non-inference AI," DeepSeek's achievement of such performance under resource constraints is remarkable.
If DeepSeek-R2 follows the trajectory of R1, it could pose a direct challenge to OpenAI's rumored upcoming GPT-5. DeepSeek's open, resource-efficient strategy versus OpenAI's closed, capital-intensive approach represents two competing visions for the future of artificial intelligence.
Currently, users can download the full model weights via Hugging Face and experience DeepSeek-V3-0324's API through platforms like OpenRouter. DeepSeek's own chat interface may also be updated to the new version. DeepSeek's open strategy is redefining the global AI landscape, heralding a more open and accessible era of AI innovation.