Yuan2-M32-hf-int4

High-performance mixture of experts language model

CommonProductProgrammingMixture of ExpertsAttention Router
Yuan2.0-M32 is a mixture of experts (MoE) language model featuring 32 experts, of which 2 are active. It introduces a new routing network—an attention router—to improve the efficiency of expert selection, resulting in a 3.8% accuracy boost over models using traditional routing networks. Yuan2.0-M32 was trained from scratch using 200 billion tokens, with a computational cost only 9.25% of that required by similarly parameterized dense models. It demonstrates competitive performance in coding, mathematics, and various professional fields, with only 370 million active parameters out of a total of 4 billion, and a forward computation requirement of just 7.4 GFLOPS per token, which is only 1/19th of Llama3-70B's requirements. In MATH and ARC-Challenge benchmark tests, Yuan2.0-M32 outperformed Llama3-70B, achieving accuracies of 55.9% and 95.8%, respectively.
Visit

Yuan2-M32-hf-int4 Visit Over Time

Monthly Visits

19075321

Bounce Rate

45.07%

Page per Visit

5.5

Visit Duration

00:05:32

Yuan2-M32-hf-int4 Visit Trend

Yuan2-M32-hf-int4 Visit Geography

Yuan2-M32-hf-int4 Traffic Sources

Yuan2-M32-hf-int4 Alternatives