At today's 2024 Baidu Cloud Smart Conference, Shen Dou, Executive Vice President of Baidu Group, announced the upgrade of the Baidu Baike Computing Platform 4.0. This new version supports multi-chip hybrid training and multi-chip adaptation, achieving an effective training duration of over 99.5% on a cluster of ten thousand cards, significantly enhancing the efficiency of computational power usage.
Against the backdrop of current computational power scarcity, the upgrade of Baike 4.0 will help enterprises utilize computing resources more effectively and reduce operational costs. The focus of the upgrade is on enhancing the "multi-chip hybrid training" capability, achieving 95% training efficiency on a cluster of ten thousand cards, reaching a leading industry level.
Additionally, Baike 4.0 has achieved second-level deployment, reducing the preparation time for a ten thousand card cluster from weeks to just one hour, greatly improving deployment efficiency and shortening the business launch cycle. Addressing frequent failures during large-scale model training, Baike 4.0 has upgraded its failure detection methods and automatic fault tolerance mechanisms, effectively reducing the frequency of failures and the time required for fault handling, ensuring an effective training duration of over 99.5%.
In terms of model inference, Baike 4.0 has optimized both speed and cost, especially in long-text inference, where efficiency has more than doubled, meeting the growing market demand.