Kuaishou recently launched a new text-to-video generation framework called CineMaster, which features 3D perception capabilities and is dubbed the video version of ControlNet. It is reported that CineMaster allows users to precisely control the position of objects and camera movements in the generated video through various control signals, providing unprecedented creative freedom.
The core advantage of CineMaster lies in its powerful control capabilities. Users can generate videos not only through traditional text prompts but also make fine adjustments by combining the following control signals:
* **Depth Map:** Used to control the depth information of the scene and the spatial relationships of objects.
* **Camera Trajectory:** Precisely specifies the camera's movement path in the video, achieving various complex shot effects.
* **Object Labels:** Used to tag and control the position and behavior of specific objects within the scene.
By combining these control signals, users can achieve precise control over the generated video content, creating more creative and personalized works.
Additionally, Kuaishou provides a process for extracting 3D bounding boxes and camera trajectories from large-scale videos, offering strong data support for the training and application of CineMaster.
The project page for CineMaster is now live, and interested users can visit cinemaster-dev.github.io/.