Translated Data: Microsoft's video generation model, DragNUWA, enables static images to come alive by dragging to specify motion paths, creating seamless videos. This model supports simultaneous control of camera and multiple object movements along with complex trajectories, producing videos that feature realistic scenes and artistic painting characteristics. DragNUWA incorporates text, image, and trajectory information to finely control video content from semantic, spatial, and temporal perspectives. Researchers have tested the model in terms of camera movement and complex trajectories, confirming its capabilities in precise modeling and control of intricate motions. The training process overview of this model includes trajectory sampler, multi-scale fusion, and adaptive training. DragNUWA utilizes the WebVid and VideoHD datasets for training. The model holds vast potential applications in fields such as video production and animation creation.