Skywork-MoE-Base
A high-performance mixed expert (MoE) model with 146 billion parameters
CommonProductProgrammingMixed expert modelLarge scale parameters
Skywork-MoE-Base is a high-performance mixed expert (MoE) model with 146 billion parameters, comprising 16 experts and activating 22 billion parameters. The model is initialized from the dense checkpoint of the Skywork-13B model and introduces two innovative techniques: gated logical normalization enhances expert diversity, and an adaptive auxiliary loss coefficient allows for layer-specific adjustment of the auxiliary loss coefficient. Skywork-MoE exhibits comparable or superior performance to models with more parameters or activation parameters on various popular benchmark tests.
Skywork-MoE-Base Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57