Yuan2.0-M32-hf-int8

High-Performance Mixture of Experts Language Model

CommonProductProgrammingMixture of Experts ModelAttention Router
Yuan2.0-M32-hf-int8 is a mixture of experts (MoE) language model featuring 32 experts, of which 2 are active. By adopting a new routing network—the attention router—it enhances the efficiency of expert selection, resulting in an accuracy improvement of 3.8% compared to models using traditional routing networks. Yuan2.0-M32 was trained from scratch on 200 billion tokens, with its training computation demand being just 9.25% of that required by a dense model of equivalent parameter size. This model is competitive in programming, mathematics, and various specialized fields while utilizing only 3.7 billion active parameters, which is a small portion of a total of 4 billion parameters. The forward computation per token requires only 7.4 GFLOPS, just 1/19th of what Llama3-70B demands. Yuan2.0-M32 outperformed Llama3-70B in the MATH and ARC-Challenge benchmark tests, achieving accuracy rates of 55.9% and 95.8%, respectively.
Visit

Yuan2.0-M32-hf-int8 Visit Over Time

Monthly Visits

17104189

Bounce Rate

44.67%

Page per Visit

5.5

Visit Duration

00:05:49

Yuan2.0-M32-hf-int8 Visit Trend

Yuan2.0-M32-hf-int8 Visit Geography

Yuan2.0-M32-hf-int8 Traffic Sources

Yuan2.0-M32-hf-int8 Alternatives