ByteDance's Doubao large model team recently announced a breakthrough in addressing key bottlenecks in Mixture-of-Experts (MoE) architecture, open-sourcing a significant optimization technology called COMET. This technology dramatically improves large model training efficiency, achieving a 1.7x speedup and reducing training costs by 40%.
Image Source Note: Image generated by AI, licensed by Midjourney.
COMET has been deployed in ByteDance's large-scale multi-GPU cluster training, saving millions of GPU hours. Compared to recent MoE optimization solutions like DeepSeek's DualPipe, COMET boasts superior compatibility and ease of use. It integrates seamlessly into existing MoE training frameworks like a plug-in, supporting mainstream large models without invasive modifications.
Technical data shows that COMET accelerates individual MoE layers by 1.96x, resulting in a 1.71x average end-to-end efficiency improvement. It maintains stable performance across various parallel strategies, input scales, and hardware environments. Notably, COMET can be used in conjunction with DeepSeek's DualPipe, potentially further significantly reducing model training costs.
This open-sourced technology represents a significant advancement in the large model field, promising to accelerate the research and application of large models.
Paper: https://arxiv.org/pdf/2502.19811
GitHub: https://github.com/bytedance/flux?continueFlag=c1d74dd2912ab3909a1a27fe4f5cf519