Yuan2.0-M32-hf-int8
High-Performance Mixture of Experts Language Model
CommonProductProgrammingMixture of Experts ModelAttention Router
Yuan2.0-M32-hf-int8 is a mixture of experts (MoE) language model featuring 32 experts, of which 2 are active. By adopting a new routing network—the attention router—it enhances the efficiency of expert selection, resulting in an accuracy improvement of 3.8% compared to models using traditional routing networks. Yuan2.0-M32 was trained from scratch on 200 billion tokens, with its training computation demand being just 9.25% of that required by a dense model of equivalent parameter size. This model is competitive in programming, mathematics, and various specialized fields while utilizing only 3.7 billion active parameters, which is a small portion of a total of 4 billion parameters. The forward computation per token requires only 7.4 GFLOPS, just 1/19th of what Llama3-70B demands. Yuan2.0-M32 outperformed Llama3-70B in the MATH and ARC-Challenge benchmark tests, achieving accuracy rates of 55.9% and 95.8%, respectively.
Yuan2.0-M32-hf-int8 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32