CogVideoX v1.5 Open Source AI Video Generation Model Supports 5/10 Second Video Generation

AIbase基地

Published inAI News · 4 min read · Nov 8, 2024

345

Beijing Zhipu Huazhang Technology Co., Ltd. has announced the release of the latest version of its CogVideoX series models, CogVideoX v1.5, which is now open-sourced. Since its launch in early August, this series of models has become a leader in the field of video generation, thanks to its cutting-edge technology and features highly favored by developers. The new version, CogVideoX v1.5, has undergone significant enhancements over its predecessor, including improved video generation capabilities, now supporting 5/10-second, 768P, 16-frame videos, and the I2V model can support any size ratio, greatly enhancing the quality and complex semantic understanding of image-to-video conversion.

WeChat Screenshot_20241108145545.png

The open-source content includes two models: CogVideoX v1.5-5B and CogVideoX v1.5-5B-I2V. The new version will also be synchronized to the Qingying platform, combined with the newly launched CogSound audio model, offering enhanced quality, support for ultra-high-definition resolution, variable ratio adaptation to different playback scenarios, multi-channel output, and AI video with sound effects.

Technically, CogVideoX v1.5 has improved by automating the filtering framework to eliminate video data with poor dynamic connectivity and by using the end-to-end video understanding model CogVLM2-caption to generate accurate video content descriptions, enhancing text comprehension and instruction following abilities. Additionally, the new version employs an efficient three-dimensional variational autoencoder (3D VAE) to address content coherence issues and has independently developed a Transformer architecture that integrates text, time, and space dimensions, eliminating traditional cross-attention modules and optimizing the utilization of time step information in diffusion models through expert adaptive layer normalization.

In terms of training, CogVideoX v1.5 has constructed an efficient diffusion model training framework, achieving rapid training for long video sequences through various parallel computing and time optimization techniques. The company has verified the effectiveness of scaling laws in the field of video generation and plans to expand data volume and model size in the future, exploring innovative model architectures to more efficiently compress video information and better integrate text with video content.

Code: https://github.com/thudm/cogvideo

Model: https://huggingface.co/THUDM/CogVideoX1.5-5B-SAT

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://top.aibase.com/1. Tencent Yuanbao upgrades again: one phrase search, images and videos appear instantly, making information retrieval more intuitive! The upgraded features of Tencent Yuanbao make information retrieval more intuitive and efficient. Users just need to ask a question in one phrase to get text and image results.

Google Launches New Veo 3 Video Generation Model Globally

Google announced the global launch of its latest video generation model, Veo3. This long-anticipated release has generated great excitement among users, as Veo3 is now available to Gemini users in over 159 countries, offering a new video creation experience. The key feature of the Veo3 video generation model is its ability to generate videos up to eight seconds long based on simple text prompts. According to Google, this technology is designed for creative users, especially those on social media who increasingly demand short-form content.

E Ink Launches AI Touchpad: E-Paper Technology May Change the Way Laptops Are Interacted With

E Ink recently announced the development of a new touchpad for laptops, which uses the same e-paper technology as e-readers. This innovative product is not simply about increasing the size of the touchpad or adding secondary display features, but rather positioning it as a dedicated platform for AI applications and assistants, designed to run in parallel with mainstream operating systems. E Ink released a prototype image showing the upgraded touchpad, which is equipped with a color e-ink screen similar to the Amazon Kindle Color.

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

A recent report titled "Health Tech 50 - The First Edition" released by KPMG China reveals that China has taken a leading position in the field of medical large models globally. The report indicates that the number of medical large models launched in China accounts for more than 70% of the global total, far surpassing other countries and regions. In terms of model categories, large language models (LLMs) are the most numerous, accounting for nearly 65%. Moreover, the report also highlights the strong growth momentum of the intelligent medical devices market in China. It is expected that by 2025, the scale of the intelligent medical devices market in China will reach 24.23 billion yuan, and it will continue to grow.

Byte EX-4D Technology Achieves Monocular Video 4D Conversion, Unlocking High-Quality Content Generation Under Extreme Perspectives

The EX-4D (Extreme Viewpoint 4D Video Generation) technology, developed by the research team tau-yihouxiang, is a groundbreaking innovation in video generation that is gaining widespread attention globally. This technology aims to transform monocular videos into controllable 4D experiences, particularly demonstrating excellent performance under extreme camera angles. The core of the EX-4D technology lies in its unique 'depth watertight mesh' construction method. This novel geometric representation

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

CogVideoX v1.5 Open Source AI Video Generation Model Supports 5/10 Second Video Generation

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Tencent Yuanbao Upgrades for One-Phrase Image and Video Search; WeChat Pay MCP Launches; Google Unveils Veo 3 Globally

Google Launches New Veo 3 Video Generation Model Globally

E Ink Launches AI Touchpad: E-Paper Technology May Change the Way Laptops Are Interacted With

Google Veo 3 Video Generation Model Now Available to Pro/Ultra Subscribers, Will Add Photo-to-Video Function

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

Bilibili Open-Sourced Anime Video Generation Model AniSora V3 Version - One-Click Generation of Various Style Anime Video Shots

Byte EX-4D Technology Achieves Monocular Video 4D Conversion, Unlocking High-Quality Content Generation Under Extreme Perspectives

ByteDance EX-4D Shakes Open Source: Turn Monocular Video into Free Perspective 4D Movie

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process