Skywork-MoE-Base-FP8

146B parameter high-performance MoE model

CommonProductProgrammingMoELarge Model
Skywork-MoE is a 146-billion parameter high-performance Mixture of Experts (MoE) model, featuring 16 experts and 2.2 billion activation parameters. This model is initialized from the dense checkpoint of the Skywork-13B model. Two innovative techniques are introduced: gated logic normalization, enhancing expert diversity; and adaptive auxiliary loss coefficient, allowing layer-specific auxiliary loss coefficient adjustment. Skywork-MoE demonstrates comparable or superior performance to models with more parameters or activation parameters across various popular benchmark tests, such as C-Eval, MMLU, CMMLU, GSM8K, MATH, and HumanEval.
Visit

Skywork-MoE-Base-FP8 Visit Over Time

Monthly Visits

19075321

Bounce Rate

45.07%

Page per Visit

5.5

Visit Duration

00:05:32

Skywork-MoE-Base-FP8 Visit Trend

Skywork-MoE-Base-FP8 Visit Geography

Skywork-MoE-Base-FP8 Traffic Sources

Skywork-MoE-Base-FP8 Alternatives