DeepSeek-V3

A Mixture-of-Experts language model with 671 billion parameters.

ChineseSelectionProductivityNatural Language ProcessingDeep Learning

DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model featuring a total of 671 billion parameters, activating 37 billion parameters at a time. It utilizes the Multi-head Latent Attention (MLA) and DeepSeekMoE architecture, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 introduces a novel load balancing strategy without auxiliary losses and establishes multiple-token prediction training objectives for enhanced performance. It has been pre-trained on 14.8 trillion high-quality tokens, followed by supervised fine-tuning and reinforcement learning stages to fully leverage its capabilities. Comprehensive evaluations demonstrate that DeepSeek-V3 outperforms other open-source models, achieving performance on par with leading proprietary models. Despite its outstanding performance, the complete training process of DeepSeek-V3 requires only 2.788 million H800 GPU hours, with a highly stable training environment.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

DeepSeek-V3

DeepSeek-V3 Visit Over Time

DeepSeek-V3 Visit Trend

DeepSeek-V3 Visit Geography

DeepSeek-V3 Traffic Sources

DeepSeek-V3 Alternatives

recurrent-pretraining — Pretraining code for large-scale deep recurrent language models, capable of running on 4096 AMD GPUs.

DiT-MoE — Large-scale Parameter Diffusion Transformer Model

Tülu 3 405B — Tülu 3 405B is a large-scale open-source language model enhanced through reinforcement learning.

Vary — Visual Vocabulary Expansion for Large-Scale Visual Language Models

LLMs-from-scratch — Deep dive into the inner workings of large language models.

MoE-LLaVA — An expert mixture model based on large-scale vision-language models

Models Table — A comprehensive list and information about large language models

Aphrodite Engine — PygmalionAI's large-scale inference engine

AIM — Pre-training of Large-Scale Autoregressive Image Models

persona-hub — Large-scale synthetic dataset, empowering personalized research

Understanding Deep Learning — Deep understanding of the principles and applications of deep learning

DiffusionRL — Large-scale Reinforcement Learning for Diffusion Models

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

LLM Maybe LongLM — Extends the context window of large language models

DeepSeek-V3 — A Mixture-of-Experts language model with 671 billion parameters.

BASE TTS — Amazon's Large-scale Voice Synthesis Model

Large World Models — Large World Models: Understanding Video and Language

Gemma-2B-10M — The Gemma 2B model supports 10M sequence length, optimizes memory usage, and is suitable for large-scale language model applications.

ml-mdm — Efficiently trains high-quality text-to-image diffusion models

RWKV — The new generation of large-scale model architecture, surpassing transformer.

Qwen1.5-MoE-A2.7B — A large-scale MoE (Mixture of Experts) language model whose performance rivals that of 70 billion parameter models.

Sonus-1 — Sonus-1: A New Era of Large Language Models (LLMs)

OLMo 2 7B — A large language model with 7 billion parameters, enhancing natural language processing capabilities.

GLM-4-32B — A powerful language model supporting various natural language processing tasks.

Florence-VL — Enhancement tool for visual language models, combining generative visual encoders and deep breadth fusion technology.

Llama-3-Patronus-Lynx-8B-Instruct-Q4_K_M-GGUF — A quantized large language model based on a specific architecture, suitable for natural language processing tasks.

d1 — Improving the reasoning capabilities of diffusion large language models using reinforcement learning.

MiLM-6B — A large-scale pre-trained language model developed by Xiaomi with a parameter scale of 64 billion.

RoleLLM — Role-playing framework for large language models

MedTrinity-25M — A large-scale multimodal medical dataset

GEO Services