Yuan2.0-M32
Efficient Mixed Expert Attention Routing Language Model
CommonProductProgrammingMixed ExpertAttention Routing
Yuan2.0-M32 is a mixed expert (MoE) language model featuring 32 experts, out of which 2 are active. It introduces a novel routing network—attention routing—to improve expert selection efficiency, achieving a 3.8% increase in accuracy. The model is trained from scratch using 2000B tokens, with a training computational load only 9.25% of that required by a dense model with the same parameter scale. It demonstrates competitive performance in coding, mathematics, and various specialized fields, utilizing just 3.7B active parameters, with a per-token forward computation requirement of only 7.4 GFLOPS, which is 1/19 of what Llama3-70B demands. It surpasses Llama3-70B in MATH and ARC-Challenge benchmark tests, achieving accuracy rates of 55.9% and 95.8%, respectively.
Yuan2.0-M32 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32