Skywork-MoE-Base
A high-performance mixed expert (MoE) model with 146 billion parameters
CommonProductProgrammingMixed expert modelLarge scale parameters
Skywork-MoE-Base is a high-performance mixed expert (MoE) model with 146 billion parameters, comprising 16 experts and activating 22 billion parameters. The model is initialized from the dense checkpoint of the Skywork-13B model and introduces two innovative techniques: gated logical normalization enhances expert diversity, and an adaptive auxiliary loss coefficient allows for layer-specific adjustment of the auxiliary loss coefficient. Skywork-MoE exhibits comparable or superior performance to models with more parameters or activation parameters on various popular benchmark tests.
Skywork-MoE-Base Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32