Skywork-MoE-Base-FP8
146B parameter high-performance MoE model
CommonProductProgrammingMoELarge Model
Skywork-MoE is a 146-billion parameter high-performance Mixture of Experts (MoE) model, featuring 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model. Two innovative techniques are introduced: gated logic normalization, enhancing expert diversity; and adaptive auxiliary loss coefficient, allowing layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or superior performance to models with more parameters or activation parameters across various popular benchmark tests, such as C-Eval, MMLU, CMMLU, GSM8K, MATH, and HumanEval.
Skywork-MoE-Base-FP8 Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32