AI Ranking

AI Ranking

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

AI News
AI Tools
AI Cases
AI Tutorial

Type :

AI News
AI Tools
AI Cases
AI Tutorial

2025-02-12 14:04:43.AIbase

ByteDance's UltraMem Architecture Reduces Large Model Inference Costs by 83%

The ByteDance Doubao large model team announced today the successful development of a new sparse model architecture called UltraMem. This architecture effectively addresses the high memory access issues during the inference of MoE (Mixture of Experts) models, improving inference speed by 2 to 6 times compared to MoE, and reducing inference costs by up to 83%. This groundbreaking advancement opens a new path for efficient inference of large models. The UltraMem architecture successfully resolves the memory bottleneck during inference of MoE architectures while maintaining model performance. Experimental results show that the parameters and activation conditions are the same.

ByteDance's UltraMem Architecture Reduces Large Model Inference Costs by 83%