Google's research team has recently introduced ReCapture technology, which is revolutionizing the traditional approach to video editing. This innovation allows ordinary users to easily achieve professional-level camera motion adjustments, enabling them to redesign the lens language for already shot videos.
In traditional video post-production, altering the camera angle of a shot video has always been a technical challenge. Existing solutions often struggle to maintain both complex camera movements and image details when processing different types of video content. ReCapture takes a different approach, eschewing the traditional 4D intermediate representation method, and instead cleverly utilizes the motion knowledge stored in generative video models. Through Stable Video Diffusion, the task is redefined as a video-to-video conversion process.
This system employs a two-stage workflow. The first stage generates an "anchor video," which is the initial output version with new camera angles. This stage can create multi-angle videos using diffusion models like CAT3D, or achieve this through frame-by-frame depth estimation and point cloud rendering. Although this version may have some temporal inconsistencies and visual imperfections, it lays the groundwork for the second stage.
The second stage applies masked video refinement, utilizing a generative video model trained on existing footage to create realistic motion effects and temporal changes. The system introduces temporal LoRA (Low-Rank Adaptation) layers to optimize the model, enabling it to understand and replicate the specific dynamic characteristics of the anchor video without retraining the entire model. Meanwhile, spatial LoRA layers ensure that the image details and content remain consistent with the new camera movements. This allows the generative video model to perform operations such as zooming, panning, and tilting while maintaining the original video's motion characteristics.
Although ReCapture has made significant strides in user-friendly video processing, it is still in the research phase and is a ways off from commercial application. It is worth noting that, although Google has numerous video AI projects, none have been brought to market yet, with the Veo project possibly being the closest to commercialization. Similarly, Meta's recently launched Movie-Gen model and OpenAI's Sora released earlier this year have also not been commercialized. Currently, the video AI market is mainly led by startups like Runway, which launched the latest Gen-3Alpha model last summer.