FLATTEN

Consistent text-to-video editing via optical flow guided attention

CommonProductImageText-to-video editingOptical flow
FLATTEN is an optical flow guided attention plugin for text-to-video editing. It addresses the consistency issue in text-to-video editing by introducing optical flow into the U-Net of a diffusion model. FLATTEN enhances the visual consistency of edited videos by forcing patches along the same optical flow paths in different frames to attend to each other in the attention module. Furthermore, FLATTEN is zero-shot and can be seamlessly integrated into any diffusion-based text-to-video editing method, improving its visual consistency. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on existing text-to-video editing benchmarks. In particular, our method excels in preserving the visual consistency of edited videos.
Visit