DiTCtrl
Explore attention control in multimodal diffusion transformers for un-tuned, multi-prompt long video generation.
CommonProductVideoVideo GenerationMultimodal
DiTCtrl is a video generation model based on the Multimodal Diffusion Transformer (MM-DiT) architecture, focusing on generating coherent scene videos with multiple continuous prompts without additional training. By analyzing the attention mechanism of MM-DiT, this model achieves precise semantic control and attention sharing between different prompts, producing videos with smooth transitions and cohesive object movement. The main advantages of DiTCtrl include no training requirement, capability to handle multi-prompt video generation tasks, and showcasing cinematic transition effects. Additionally, DiTCtrl introduces a new benchmark called MPVBench specifically designed for evaluating the performance of multi-prompt video generation.
DiTCtrl Visit Over Time
Monthly Visits
No Data
Bounce Rate
No Data
Page per Visit
No Data
Visit Duration
No Data
DiTCtrl Visit Trend
No Visits Data
DiTCtrl Visit Geography
No Geography Data
DiTCtrl Traffic Sources
No Traffic Sources Data