Consistent Video Generation Model Snap Video with Sora Architecture

AIbase

Published inAI News · 4 min read · Jul 15, 2024

151

In the digital media era, video has become our primary means of self-expression and storytelling. However, creating high-quality videos typically requires professional skills and expensive equipment. Now, with Snap Video, you can automatically generate videos just by describing the scene you want in text.

$\"image.png\"/$

The current image generation models have already demonstrated excellent quality and diversity. Inspired by this, researchers have begun to apply these models to video generation. However, the high redundancy of video content can reduce the authenticity, visual quality, and scalability of actions when directly applying image models to the video generation field.

Snap Video is a video-centric model that systematically addresses these challenges. First, it extends the EDM framework, considering redundant pixels in both space and time, naturally supporting video generation. Second, it proposes a new architecture based on transformers, which is 3.31 times faster to train than U-Net and 4.5 times faster during inference. This allows Snap Video to efficiently train text-to-video models with hundreds of millions of parameters, achieving state-of-the-art results and generating videos with higher quality, significant improvements in temporal consistency, and action complexity.

Technical Highlights:

Spatio-temporal Joint Modeling: Snap Video can synthesize videos with large-scale motion while retaining the semantic control capability of large-scale text-to-video generators.

High-resolution Video Generation: Using a two-stage cascaded model, it first generates low-resolution videos and then performs high-resolution upsampling, avoiding potential temporal inconsistency issues.

Architecture Based on FIT: Snap Video utilizes the FIT (Far-reaching Interleaved Transformers) architecture to achieve efficient spatio-temporal joint modeling by learning compressed video representations.

Snap Video has been evaluated on widely used datasets such as UCF101 and MSR-VTT, demonstrating particular advantages in generating high-quality actions. User studies also show that Snap Video is superior to the latest methods in video text alignment, action quantity, and quality.

The paper also discusses other research works in the video generation field, including methods based on adversarial training or autoregressive generation techniques, as well as recent progress in adopting diffusion models in the text-to-video generation task.

Snap Video systematically addresses common issues in the diffusion process and architecture in text-to-video generation by treating videos as first-class citizens. It proposes modifications to the EDM diffusion framework and an architecture based on FIT, significantly improving the quality and scalability of video generation.

Paper Address: https://arxiv.org/pdf/2402.14797

Lovable7's Annual Income Reaches 80 Million Dollars, Half of the Team Are AI-Native Employees

AI-native employees are reshaping work models. Start-up Lovable achieved $80M annual revenue in 7 months with a 35-person team through AI-driven agility: instant AI implementation bypassing traditional processes, using proprietary AI tools for rapid development, and empowering autonomous young employees. The author predicts more AI-native workers will emerge, challenging traditional management while becoming core innovation drivers.....

40 Million Students and Parents Use AI to Apply for College Quark Breaks Records in Gaokao Services

The 2025 Gaokao volunteer application service has concluded, with data from the Quark platform showing that its AI service has set multiple records: serving over 40 million users, generating 12 million volunteer reports, and answering 330 million questions. The three core features launched this year are based on a self-developed Gaokao large model, achieving intelligent assistance throughout the entire process from consultation to decision-making. Notably, students' questions have shown a trend towards deeper personalization, with complex questions doubling in proportion. The platform has also extended its AI services to rural areas through the Warm Light Project, serving a cumulative total of 160 million users over five years, demonstrating

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

1.Tencent's Hunyuan3D-PolyGen boosts 3D modeling efficiency by 70% with BPT tech. 2.Alibaba's HumanOmniV2 achieves 69.33% accuracy in multilingual input. 3.DingTalk AI processes 1k tasks/hour with 'spreadsheet-as-document'. 4.Baidu PaddleOCR3.1 improves 37-language recognition by 30%. 5.Microsoft Deep Research opens API. 6.HKPolyU & OPPO's DLoRAL speeds video enhancement 10x. 7.Google opens MCP Toolbox for SQL. 8.Microsoft Win11 to add AI dynamic....

DingTalk Launches New AI Spreadsheet Functionality, Introducing the 'Spreadsheet as Document' Feature

Recently, DingTalk officially launched the 'AI Spreadsheet' feature, marking the official start of a new application entry point for the AI era. In DingTalk AI Spreadsheet, AI technology has become an intrinsic capability, with each cell serving as an AI access point, creating intelligent workflows and providing enterprises and users with an unprecedented method of building business systems.

Xbox Executive's Suggestion to Use AI to Deal with Layoff Emotions Sparks Controversy

Microsoft announced the global layoff of 9,000 employees. Xbox executive Matt Turnbull suggested that laid-off employees use AI tools like ChatGPT to cope with their emotions, which sparked controversy. He shared AI prompt templates to help with career planning, but the suggestion was criticized as distasteful. Netizens believe that AI cannot replace the emotional trauma caused by layoffs. This round of layoffs affects 4% of Microsoft's employees, and the gaming department may be hit the hardest. The incident reflects broader societal discussions on employee mental health support and the boundaries of AI application in the context of the current trend of layoffs in tech companies.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Consistent Video Generation Model Snap Video with Sora Architecture

AIbase

This article is from AIbase Daily

AI News Recommendations

Lovable7's Annual Income Reaches 80 Million Dollars, Half of the Team Are AI-Native Employees

40 Million Students and Parents Use AI to Apply for College Quark Breaks Records in Gaokao Services

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

Ali HumanOmniV2 Launches with a Shock: The New King of Multimodal AI, Accuracy Surges to 69.33%

Samsung Expects Second-Quarter Profit to Halve Amid Challenges in AI Demand

DingTalk AI Table Launches: Process 1,000 Tasks in 1 Hour, Easy Data Analysis for Everyone

Apple and Columbia University Collaborate to Develop AI System SceneScout to Assist Blind People with Street View Navigation

DingTalk Launches New AI Spreadsheet Functionality, Introducing the 'Spreadsheet as Document' Feature

Microsoft Win11 is about to launch the AI Dynamic Wallpaper feature, preview code has appeared

Xbox Executive's Suggestion to Use AI to Deal with Layoff Emotions Sparks Controversy