Yuan2-M32-hf-int4
High-performance mixture of experts language model
CommonProductProgrammingMixture of ExpertsAttention Router
Yuan2.0-M32 is a mixture of experts (MoE) language model featuring 32 experts, of which 2 are active. It introduces a new routing network—an attention router—to improve the efficiency of expert selection, resulting in a 3.8% accuracy boost over models using traditional routing networks. Yuan2.0-M32 was trained from scratch using 200 billion tokens, with a computational cost only 9.25% of that required by similarly parameterized dense models. It demonstrates competitive performance in coding, mathematics, and various professional fields, with only 370 million active parameters out of a total of 4 billion, and a forward computation requirement of just 7.4 GFLOPS per token, which is only 1/19th of Llama3-70B's requirements. In MATH and ARC-Challenge benchmark tests, Yuan2.0-M32 outperformed Llama3-70B, achieving accuracies of 55.9% and 95.8%, respectively.
Yuan2-M32-hf-int4 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32