Skywork-MoE
A high-performance MoE model with 14.6 billion parameters
PremiumNewProductProgrammingMoE modelLarge language model
Skywork-MoE is a high-performance Mixture of Experts (MoE) model with 14.6 billion parameters, consisting of 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model and incorporates two innovative techniques: gated logit normalization to enhance expert diversification, and adaptive auxiliary loss coefficients allowing for layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or better performance than models with more parameters or activation parameters, such as Grok-1, DBRX, Mistral 8*22, and Deepseek-V2.
Skywork-MoE Visit Over Time
Monthly Visits
474564576
Bounce Rate
36.20%
Page per Visit
6.1
Visit Duration
00:06:34