Skywork-MoE

A high-performance MoE model with 14.6 billion parameters

PremiumNewProductProgrammingMoE modelLarge language model
Skywork-MoE is a high-performance Mixture of Experts (MoE) model with 14.6 billion parameters, consisting of 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model and incorporates two innovative techniques: gated logit normalization to enhance expert diversification, and adaptive auxiliary loss coefficients allowing for layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or better performance than models with more parameters or activation parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.
Visit

Skywork-MoE Visit Over Time

Monthly Visits

474564576

Bounce Rate

36.20%

Page per Visit

6.1

Visit Duration

00:06:34

Skywork-MoE Visit Trend

Skywork-MoE Visit Geography

Skywork-MoE Traffic Sources

Skywork-MoE Alternatives