Following OpenAI's Sora's global impact, Alibaba's research team has once again amazed us with Tora, a new star in trajectory-based video generation, quietly revolutionizing our perception of AI video creation.
Key Features:
High Fidelity: Videos generated by Tora rival professional production quality, with top-notch color accuracy, clarity, and fluidity.
Motion Control: Tora precisely controls every movement in the video, accurately capturing both rapid motions and subtle changes.
Versatile Input: Whether it's text descriptions, static images, or dynamic trajectories, Tora can handle them all, meeting a variety of creative needs.
Unlike traditional U-Net architectures, Tora employs the more advanced Diffusion Transformer (DiT) architecture. This innovation allows Tora to easily surpass previous video generation limitations, not only producing high-quality videos up to 60 seconds long but also adapting to different resolutions and aspect ratios. What's more astonishing is that the videos generated by Tora are smooth and natural, as if perfectly replicated from the real world.
Workflow:
Trajectory Encoding: First, Tora encodes the provided trajectory information into a format it can understand.
Motion Block Generation: Then, using a 3D Variational Autoencoder (VAE), the trajectory information is compressed into motion latent representations.
Motion Fusion: Finally, the Motion Guidance Fusion (MGF) integrates these motion information into DiT blocks, generating segments of video that align with the trajectories.
Tora's core advantage lies in its unique design philosophy. It cleverly integrates text, visual, and trajectory conditions to achieve precise control over video content. Extensive experiments have confirmed that Tora maintains high motion fidelity while meticulously simulating the laws of motion in the physical world. This groundbreaking achievement undoubtedly opens up infinite possibilities for future film special effects, virtual reality, and other fields.
Compared to other methods, Tora's advantages are even more pronounced. The videos it generates are not only smoother but also strictly adhere to preset trajectories, avoiding common issues like object deformation and significantly enhancing video fidelity. This advantage is particularly evident in long, high-resolution video generation, where Tora's trajectory accuracy is three to five times that of other methods.
Here are a few official demo videos selected by AIbase:
Prompt: A serene wide-angle shot captures a sunflower gently swaying in the breeze against the backdrop of a vast golden wheat field at sunset. This scene highlights the subtle movements of the sunflower, towering amidst an endless sea of wheat, bathed in the warm orange glow of the setting sun. The golden hues of the wheat field shimmer in the twilight, adding depth and richness to the landscape. As the wind sweeps through the field, the camera lingers on the sunflower, emphasizing its vibrant petals and the intricate details of its seeds. This video contains no text or additional objects, allowing viewers to immerse themselves in the tranquil beauty and the harmonious dance of nature and light.
Prompt: A single maple leaf gracefully descends from left to right against a serene lake backdrop, surrounded by a row of vibrant maple trees. The video captures the delicate descent of the maple leaf in exquisite detail, creating a stark contrast between the leaf's gentle movement and the stillness of the water. The scene of the falling maple leaf exudes a sense of tranquility. The video focuses solely on this peaceful moment, without additional text or objects, creating an immersive experience that highlights the subtle balance between movement and stillness in nature.
Prompt: A goldfish elegantly swims across the red rocky surface of Mars. The video focuses on the smooth movements of the goldfish, starkly contrasting with the breathtaking Martian landscape. The vivid contrast between the aquatic creature and the desolate Martian terrain highlights a dreamlike atmosphere, transporting viewers into this otherworldly scene. Without additional text or objects, the video allows viewers to fully appreciate the interesting and unexpected fusion of elements.
Prompt: A flock of seagulls gracefully soars over a vibrant underwater world teeming with colorful coral reefs. The video captures the surreal combination of seagulls flying underwater, their fluid movements contrasting sharply with the surrounding dense coral. The backdrop is a vivid tapestry of marine colors, with swaying corals creating a dreamlike atmosphere. The scene contains no additional text or objects, allowing viewers to fully immerse themselves in the wonderful harmony of birds' elegance and the underwater spectacle.
Prompt: A soap bubble gently floats upward among blooming wildflowers. The video captures the delicate and iridescent nature of the soap bubble, set against a backdrop of beautiful flowers. The colorful wildflowers sway gently in the background, enhancing the fleeting beauty of the soap bubble. The scene contains no additional text or objects, allowing viewers to fully immerse themselves in the serene and enchanting moment, highlighting the fragility and transient charm of the soap bubble in a vibrant natural environment.
Tora's success is not just in its outstanding performance but also in its technological innovation. The research team cleverly combined the DiT architecture of OpenSora, while introducing innovative modules such as the Trajectory Extractor (TE) and Motion Guidance Fusion (MGF). These meticulously designed components enable Tora to encode any trajectory into hierarchical spatio-temporal motion patches and seamlessly integrate them into the video generation process.
In practical applications, Tora has demonstrated remarkable stability and adaptability. Whether processing short videos of 16 frames or complex scenes of up to 128 frames, Tora maintains excellent performance. Especially in the generation of long sequences, Tora's advantages are more pronounced, as it can effectively control the growth of trajectory errors, ensuring the continuous high quality of the generated videos.
Project Link: https://top.aibase.com/tool/tora