Yuan2.0-M32

Efficient Mixed Expert Attention Routing Language Model

CommonProductProgrammingMixed ExpertAttention Routing
Yuan2.0-M32 is a mixed expert (MoE) language model featuring 32 experts, out of which 2 are active. It introduces a novel routing network—attention routing—to improve expert selection efficiency, achieving a 3.8% increase in accuracy. The model is trained from scratch using 2000B tokens, with a training computational load only 9.25% of that required by a dense model with the same parameter scale. It demonstrates competitive performance in coding, mathematics, and various specialized fields, utilizing just 3.7B active parameters, with a per-token forward computation requirement of only 7.4 GFLOPS, which is 1/19 of what Llama3-70B demands. It surpasses Llama3-70B in MATH and ARC-Challenge benchmark tests, achieving accuracy rates of 55.9% and 95.8%, respectively.
Visit

Yuan2.0-M32 Visit Over Time

Monthly Visits

17788201

Bounce Rate

44.87%

Page per Visit

5.4

Visit Duration

00:05:32

Yuan2.0-M32 Visit Trend

Yuan2.0-M32 Visit Geography

Yuan2.0-M32 Traffic Sources

Yuan2.0-M32 Alternatives