VideoTetris

An innovative framework for text-to-video generation

PremiumNewProductVideoText-to-VideoVideo Generation
VideoTetris is a novel framework that achieves text-to-video generation, particularly suitable for handling complex video generation scenarios involving multiple objects or dynamically changing object quantities. The framework utilizes spatiotemporal combination diffusion technology to precisely follow complex textual semantics and achieves this by operating on and combining the spatial and temporal attention maps of denoising networks. Furthermore, it introduces a novel reference frame attention mechanism to enhance the consistency of autoregressive video generation. VideoTetris has achieved impressive qualitative and quantitative results in combined text-to-video generation.
Visit

VideoTetris Alternatives