Yuan2-M32-hf-int4

High-performance mixture of experts language model

CommonProductProgrammingMixture of ExpertsAttention Router

Yuan2.0-M32 is a mixture of experts (MoE) language model featuring 32 experts, of which 2 are active. It introduces a new routing network—an attention router—to improve the efficiency of expert selection, resulting in a 3.8% accuracy boost over models using traditional routing networks. Yuan2.0-M32 was trained from scratch using 200 billion tokens, with a computational cost only 9.25% of that required by similarly parameterized dense models. It demonstrates competitive performance in coding, mathematics, and various professional fields, with only 370 million active parameters out of a total of 4 billion, and a forward computation requirement of just 7.4 GFLOPS per token, which is only 1/19th of Llama3-70B's requirements. In MATH and ARC-Challenge benchmark tests, Yuan2.0-M32 outperformed Llama3-70B, achieving accuracies of 55.9% and 95.8%, respectively.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Yuan2-M32-hf-int4

Yuan2-M32-hf-int4 Visit Over Time

Yuan2-M32-hf-int4 Visit Trend

Yuan2-M32-hf-int4 Visit Geography

Yuan2-M32-hf-int4 Traffic Sources

Yuan2-M32-hf-int4 Alternatives

Yuan2-M32-hf-int4 — High-performance mixture of experts language model

Yuan2.0-M32-hf-int8 — High-Performance Mixture of Experts Language Model

DeepSeek-V2.5-1210 — High-performance mixture of experts language model

Aria — Multimodal Native Mixture of Experts Model

DeepEP — DeepEP is a high-performance communication library for Mixture-of-Experts (MoE) and Expert Parallel (EP) communication.

DeepSeek-VL2-Tiny — Advanced Large-scale Mixture of Experts Visual Language Model

Mixture-of-Attention (MoA) — An attention-based architecture for personalized text-to-image generation

Tencent-Hunyuan-Large — Industry-leading open-source large mixture-of-experts model

DeepSeek-VL2-Small — An advanced large-scale mixture of experts visual language model.

Moonlight — Moonlight is a 16B parameter Mixture of Experts (MoE) model trained with the Muon optimizer, delivering exceptional performance.

Moonlight-16B-A3B — Moonlight-16B-A3B is a 16B parameter Mixture-of-Experts (MoE) model trained with the Muon optimizer for efficient language generation.

Yuan2.0-M32 — Efficient Mixed Expert Attention Routing Language Model

OpenRouter — Open-source router, connecting various AI models

RWKV-6 Mixture of Experts — The largest model in the RWKV family, utilizing MoE technology to enhance efficiency.

ChatDLM — The first high-efficiency inference language model integrating block diffusion and expert mixture technology

Falcon Mamba — The first 7B large-scale model that operates without an attention mechanism.

GRIN-MoE — High-performance, low-resource consumption hybrid expert model

DeepSeek-V3 — A Mixture-of-Experts language model with 671 billion parameters.

Revive AI — Learn AI through interactions with industry experts

Monkai - AI Focus Assistant from Your New Tab — Personal AI assistant that helps manage attention and concentration.

FlexHeadFA — A fast and memory-efficient accurate attention mechanism.

Aria-Base-64K — Multimodal native Mixture-of-Experts model

FlashAttention — A fast and memory-efficient implementation of the accurate attention mechanism

Dragonfly AI Extension — See how design decisions impact user attention.

Boundary Attention — Learns to find weak boundaries at any resolution

Era3D — High-resolution multi-view diffusion model using an efficient row attention mechanism.

AI Experts Top — AI Experts is an AI consultancy firm specializing in digital marketing, dedicated to applying artificial intelligence technology to business operations, helping enterprises improve efficiency and drive growth.

MoBA — MoBA is a Mixed Block Attention mechanism for long text contexts designed to improve the efficiency of large language models.

Star-Attention — EfficientInference Technology for Long Sequence Large Language Models

FlashInfer — FlashInfer is a high-performance GPU kernel library designed for serving large language models.