2024-02-29 11:20:17.AIbase.6.1k
ByteDance Joins Forces with Peking University to Create MegaScale: A Single 'Ten-Thousand Card Cluster' for Training LLMs
MegaScale has built a single cluster with over 10,000 GPUs, achieving a model FLOP utilization rate of 55.2%. MegaScale includes diagnostic tools for monitoring system components and events.