Recently, the Beijing Academy of Artificial Intelligence (BAAI) announced that it has collaborated with ecological partners to build and open-source the heterogeneous unified communication library FlagCX. This initiative aims to address the challenges faced by communication libraries in the era of diverse computing power and to fill an important gap in the open-source software stack for heterogeneous computing. This move responds to the special action organized by national authorities to govern typical algorithmic issues on online platforms, reflecting key principles such as correct algorithm orientation, fairness and justice, and openness and transparency.
In the era of diverse computing power, communication libraries, as the foundational software for the large-scale aggregation of computing resources, face two major challenges: First, the diversity of communication libraries leads to a lack of generality and adaptability in the implementation and optimization of communication algorithms; second, efficient interconnection across different chips cannot be achieved. To tackle these challenges, the launch of FlagCX aims to enable efficient communication between different chips and large-scale adaptive communication optimization in various scenarios.
The architectural design of FlagCX adheres to the principles of "zero overhead" and "zero cost," providing a unified communication operator interface layer for upper-level applications, shielding the underlying implementation details. Based on this, plugins for various deep learning frameworks are developed to help users utilize FlagCX at no cost across different frameworks. Throughout the design and implementation of FlagCX, the three fundamental principles of standardization, compatibility, and adaptability have always been upheld.
Performance testing shows that FlagCX achieves nearly zero overhead adaptation to the vendor's native communication libraries in cross-machine communication performance, and the heterogeneous communication performance across different chips can reach over 90% of peak bandwidth, demonstrating the potential for cross-chip heterogeneous communication.
In addition, the Academy of Artificial Intelligence is also building a related software ecosystem to form a positive cycle of collaborative innovation between industry, academia, and research, accelerating the promotion and application of heterogeneous unified communication library technology. The first batch of ecological partners includes universities, research institutions, server manufacturers, chip manufacturers, as well as cloud vendors and operators.
The open-source address for FlagCX is: https://github.com/FlagOpen/FlagCX