Snap Video
Snap Video: An extensible spatiotemporal transformer for text-to-video synthesis.
CommonProductVideoVideo SynthesisTransformer
Snap Video is a video-centric model that systematically addresses the challenges of motion fidelity, visual quality, and scalability in video generation by extending the EDM framework. Utilizing frame-level redundancy, the model proposes a scalable transformer architecture that represents the spatial and temporal dimensions as a highly compressed 1D latent vector. This allows for effective joint modeling of space and time, resulting in the synthesis of videos with strong temporal coherence and complex motion. This architecture enables the model to be efficiently trained to billions of parameters, achieving state-of-the-art results on multiple benchmarks.
Snap Video Visit Over Time
Monthly Visits
16148
Bounce Rate
50.09%
Page per Visit
1.2
Visit Duration
00:00:10