Volcano Engine recently unveiled a significant innovation at the Video Cloud Technology Conference: a large-scale model training video preprocessing solution. This technology has been successfully applied to the Doubao video generation model, marking a major advancement in AI video generation technology.
Tan Dai, President of Volcano Engine, emphasized that AIGC and multimodal technologies are profoundly transforming user experiences. Drawing on TikTok's practical experience, Volcano Engine is actively exploring the integration of AI large models with video technology to provide comprehensive solutions for businesses.
Wang Yue, Head of Video Architecture at TikTok Group, pointed out that large-scale model training faces numerous challenges, including high costs for processing massive amounts of data, varying sample quality, complex processing chains, and scheduling issues with diverse heterogeneous computing resources.
To address these challenges, Volcano Engine's preprocessing solution is based on its proprietary multimedia processing framework BMF and leverages Intel's diverse computing resources. The solution has been optimized at both the algorithmic and engineering levels, enabling efficient processing of vast amounts of video data and significantly improving model training efficiency.
Additionally, Volcano Engine has open-sourced the BMF lite version, a mobile-side post-processing solution that supports large model access on the edge and operator acceleration, making it more lightweight and versatile.
It is worth noting that the Doubao video generation model PixelDance, released on September 24, has adopted this technical solution. The model uses the DiT architecture, overcoming challenges in complex interactions involving multiple subjects and maintaining content consistency across multiple camera switches. Currently, the Doubao video generation model is available for enterprise testing through Volcano Engine.