Huawei Launches New Technology to Optimize Large Model Inference: UCM Technology Alleviates HBM Dependence
On August 12, Huawei will release a breakthrough AI inference innovation technology called UCM (Inference Memory Data Manager) at the 2025 Finance AI Inference Application Implementation and Development Forum. This technology is expected to reduce China's reliance on HBM (High Bandwidth Memory) for AI inference and significantly improve the performance of large models in China. UCM is centered around KV Cache, integrating multi-type cache acceleration algorithms, and manages memory data generated during inference in a hierarchical manner, expanding the context window to achieve high throughput and low latency inference.