Rhymes AI Open Source Video Generation Model Allegro: Transform Text into HD Videos in Seconds

AIbase基地

Published inAI News · 4 min read · Oct 22, 2024

448

Recently, AI company Rhymes AI has officially open-sourced its advanced text-to-video model, Allegro. Allegro enables users to transform simple text descriptions into high-quality short video clips, opening up new possibilities for creators, developers, and researchers in the field of AI-generated video.

Allegro can generate high-quality videos at 720p resolution, 15 frames per second, and 6 seconds in length based on user-provided text prompts, covering a variety of film themes, from close-ups of people and animals to action scenes in various settings, almost any scene described in text.

The core technologies of Allegro include large-scale video data processing, compressing raw videos into visual tokens, and an extended video diffusion Transformer.

In terms of large-scale video data processing, Rhymes AI has designed a systematic data processing and filtering pipeline to extract training videos from raw data and developed a structured data system for multidimensional classification and clustering of the data, facilitating model training and fine-tuning.

For compressing videos into visual tokens, Allegro uses a Video Variational Autoencoder (VideoVAE) to compress raw videos into smaller visual tokens while retaining necessary details, achieving smoother and more efficient video generation. VideoVAE is built on a pre-trained image VAE and extends the spatial-temporal modeling layers, effectively leveraging spatial compression capabilities.

Regarding the extended video diffusion Transformer, Allegro's core is its extended diffusion Transformer architecture, which applies diffusion models to generate high-resolution video frames, ensuring the quality and fluidity of video motion. Allegro's backbone network is built on the DiT (Diffusion Transformer) architecture, featuring 3D RoPE position embeddings and a 3D full attention mechanism. Compared to traditional diffusion models using a UNet architecture, the Transformer structure is more conducive to model scaling. By leveraging the 3D attention mechanism, DiT can simultaneously process the spatial dimensions of video frames and their temporal evolution, providing a more nuanced understanding of motion and context.

Rhymes AI states that Allegro is just the beginning, and the team is actively developing more advanced features, including image-to-video generation, motion control, and support for longer, narrative-based, storyboard-style video generation.

To make AI-driven video creation more accessible to a wider audience, Rhymes AI has open-sourced Allegro's model weights and code, encouraging the community to explore, unleash creativity, and build upon it, aiming for collaborative progress in AI-generated video technology.

Project link: https://github.com/rhymes-ai/Allegro

PixVerse AI Video Creation Platform Launches Multi-Keyframe Generation Feature

On July 11, PixVerse AI video creation platform, which has surpassed 60 million global users, announced a major feature upgrade — the addition of the 'Multi-Keyframe Generation' function in the Start-End Frame module. This marks a new stage in AI video creation, transitioning from the generation of single segments to narrative expression. Users can now upload up to 7 images as keyframes via the web version's start-end frame feature, and the AI will automatically analyze the semantic relationships between frames, intelligently building smooth action and scene transition paths. This technological breakthrough enables static images to be presented dynamically.

The Ministry of Industry and Information Technology will release the 'International Artificial Intelligence Open Source Cooperation Initiative' at the 2025 World Artificial Intelligence Conference

The 2025 World AI Conference, themed 'Intelligent Era, Global Collaboration', will be held in Shanghai from July 26-28. It will launch an international AI open-source initiative and showcase latest AI technologies, building on its success since 2018 (300k+ visitors in 2024). China also plans a BRICS AI cooperation center.....

New Breakthrough in Real-Time Video Generation: Meta StreamDiT Can Generate High-Quality Videos Frame by Frame with a Single GPU

Meta and Berkeley developed StreamDiT for real-time AI video generation: 1) 16fps 512p on single GPU, 4B-param model creates 1-min videos with live edits; 2) Novel buffer enables parallel processing (2 frames/0.5s) with 8-step optimization; 3) Trained on 3K HD videos + 2.6M dataset; 4) Outperforms rivals in motion smoothness, 30B-param version shows quality potential; 5) Enables interactive video despite transition flaws.....

Vidu Q1 Reference Video Released Globally, Supporting Up to 7 Entities as Input

A major breakthrough has emerged in the AI video field — Vidu Q1 video model, launched by Shengshu Technology, officially released the Reference Generation feature, offering a revolutionary experience where video material can be generated from imagination in one step, redefining the technical boundaries and production efficiency of content creation. In traditional video production processes, creators have to go through complex steps such as script writing, character design, storyboard drawing, real scene shooting, and post-production editing, and the creation of a short film often takes several weeks or even months. The release of the Reference Generation feature in Vidu Q1 has completely broken this established model. Users

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Rhymes AI Open Source Video Generation Model Allegro: Transform Text into HD Videos in Seconds

AIbase基地

This article is from AIbase Daily

AI News Recommendations

PixVerse AI Video Creation Platform Launches Multi-Keyframe Generation Feature

The Ministry of Industry and Information Technology will release the 'International Artificial Intelligence Open Source Cooperation Initiative' at the 2025 World Artificial Intelligence Conference

Study Warns of Major Risks in Using Artificial Intelligence to Treat Chatbots

New Breakthrough in Real-Time Video Generation: Meta StreamDiT Can Generate High-Quality Videos Frame by Frame with a Single GPU

Google Announces the Latest Class of Students at the American Artificial Intelligence Infrastructure Institute

Google Veo3 Adds Image-to-Video Feature, Users Create Over 40 Million Videos Within Seven Weeks

Vidu Q1 Shock Upgrade: Reference to Video Supports Up to Seven Images, AI Video Generation Sets New Records

Moonvalley Releases Marey Realism v1.5: Native 1080P AI Video Model, Zero Copyright Risk Leading the Industry Trend!

Meituan Invests Again in the Field of Embodied Intelligence, Xinghai Tu Completes Over $100 Million Financing

Vidu Q1 Reference Video Released Globally, Supporting Up to 7 Entities as Input