Yuan2.0-M32

Efficient Mixed Expert Attention Routing Language Model

CommonProductProgrammingMixed ExpertAttention Routing

Yuan2.0-M32 is a mixed expert (MoE) language model featuring 32 experts, out of which 2 are active. It introduces a novel routing network—attention routing—to improve expert selection efficiency, achieving a 3.8% increase in accuracy. The model is trained from scratch using 2000B tokens, with a training computational load only 9.25% of that required by a dense model with the same parameter scale. It demonstrates competitive performance in coding, mathematics, and various specialized fields, utilizing just 3.7B active parameters, with a per-token forward computation requirement of only 7.4 GFLOPS, which is 1/19 of what Llama3-70B demands. It surpasses Llama3-70B in MATH and ARC-Challenge benchmark tests, achieving accuracy rates of 55.9% and 95.8%, respectively.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Yuan2.0-M32

Yuan2.0-M32 Visit Over Time

Yuan2.0-M32 Visit Trend

Yuan2.0-M32 Visit Geography

Yuan2.0-M32 Traffic Sources

Yuan2.0-M32 Alternatives

Yuan2.0-M32 — Efficient Mixed Expert Attention Routing Language Model

phixtral-2x2_8 — A mixed expert model that outperforms individual expert models.

Skywork-MoE-Base — A high-performance mixed expert (MoE) model with 146 billion parameters

DeepSeek-V2-Chat — An efficient and economic language model with powerful mixed expert characteristics.

MoE 8x7B — MistralAI's new 8x7B mixed-expert (MoE) base model for text generation.

MoBA — MoBA is a Mixed Block Attention mechanism for long text contexts designed to improve the efficiency of large language models.

My Expert GPT — Get your own virtual expert team, powered by your ChatGPT account.

Mixture-of-Attention (MoA) — An attention-based architecture for personalized text-to-image generation

ChatDLM — The first high-efficiency inference language model integrating block diffusion and expert mixture technology

Falcon Mamba — The first 7B large-scale model that operates without an attention mechanism.

Yuan2-M32-hf-int4 — High-performance mixture of experts language model

Claude Code Router — Enhance your AI coding workflow with flexible model request routing.

Monkai - AI Focus Assistant from Your New Tab — Personal AI assistant that helps manage attention and concentration.

FlexHeadFA — A fast and memory-efficient accurate attention mechanism.

DeepEP — DeepEP is a high-performance communication library for Mixture-of-Experts (MoE) and Expert Parallel (EP) communication.

FlashAttention — A fast and memory-efficient implementation of the accurate attention mechanism

Dragonfly AI Extension — See how design decisions impact user attention.

Boundary Attention — Learns to find weak boundaries at any resolution

Era3D — High-resolution multi-view diffusion model using an efficient row attention mechanism.

Integral — Integral is a new-generation desktop and mobile application designed to replace Slack and Discord for expert communities and organizations.

Expert Robot Pro — Unlock the potential of artificial intelligence, enhance business efficiency and innovation

Meta Quest 3S — Meta Quest 3S marks a new era in mixed reality experiences.

Yuan2.0-M32-hf-int8 — High-Performance Mixture of Experts Language Model

Star-Attention — EfficientInference Technology for Long Sequence Large Language Models

SolveCube — One-Stop Expert Team Solutions

FlashInfer — FlashInfer is a high-performance GPU kernel library designed for serving large language models.

EPLB — An open-source algorithm for expert parallelism load balancing, designed to optimize expert allocation and load balancing in multi-GPU environments.

MotionCLR — Attention Mechanism-Based Motion Generation and Untrained Editing Model

AutoGroq — An AI-powered conversational assistant that automatically generates expert agents, streamlining AI tool interactions.

4D-fy — High-Fidelity Text-to-4D Generation

GEO Services