Best Performance! Step-Video-T2V Video Generation Model from Step-Chronicles

AIbase基地

Published inAI News · 3 min read · Feb 18, 2025

419

Today, Step Star and Geely Automobile Group announced a collaboration to open source two models from the Step series of multimodal large models—Step-Video-T2V video generation model and Step-Audio speech model.

The Step-Video-T2V video generation model is globally leading in both parameter count and performance. This model has 30 billion parameters and can directly generate high-quality videos with 204 frames at 540P resolution, ensuring high information density and strong consistency in the generated content. Evaluation results show that Step-Video-T2V excels in aspects such as instruction adherence, motion smoothness, physical realism, and aesthetic quality, significantly surpassing existing best open-source video models on the market.

WeChat Screenshot_20250218085337.png

Currently, both models are live on the Yuewen App, available for developers to experience and provide valuable feedback.

The Step-Video-T2V video generation model demonstrates exceptional generation capabilities in complex movements, aesthetically pleasing characters, and visual imagination. It can accurately understand instructions and efficiently assist video creators in realizing their creative presentations. Whether it's the elegant beauty of ballet, the intense action of karate, the thrilling excitement of badminton, or the rapid flips of diving, Step-Video-T2V can generate realistic scenes that adhere to physical laws.

Additionally, it supports various camera movements and scene transitions, capable of generating visually impactful effects with significant camera motion. The generated characters appear more realistic and vivid, with rich details and natural expressions.

GitHub:

https://github.com/stepfun-ai/Step-Audio

Hugging Face:

https://huggingface.co/collections/stepfun-ai/step-audio-67b33accf45735bb21131b0b

Technical Report:

https://github.com/stepfun-ai/Step-Audio/blob/main/assets/Step-Audio.pdf

Step-Chronicles Step-Video-T2V Geely Automobile Multimodal Large Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Jieyue Xingchen and Yuanli Lingji Announce Strategic Partnership

Jieyue Xingchen and Yuanli Lingji have signed a strategic cooperation agreement in Beijing. Both parties will leverage their respective technological advantages to carry out in-depth cooperation in multimodal large model technology, intelligent terminal Agents, and embodied AI scenarios. The goal of this cooperation is to achieve "reasoning in the physical world", jointly developing an intelligent robot named "RoboAgent", and promoting the practical application of Artificial General Intelligence (AGI). At the signing ceremony, Dr. Jiang Daxin, founder and CEO of Jieyue Xingchen, and the co-founders of Yuanli Lingji...

Apr 24, 2025

Shanghai AI Laboratory Unveils Upgraded Multimodal Large Model, 'Shusheng · Wanxiang 3.0'

Apr 17, 2025

380

National Supercomputing Platform Releases New Generation Multimodal Large Model to Advance AI Agent Development

Apr 16, 2025

180

SenseTime Unveils New Multimodal Large Model, Shaping the Future of Interaction

At SenseTime's Technology Exchange Day on April 10th, the company launched its latest multimodal large model, SenseNova V6, and the SenseCore 2.0 system. This new version aims to integrate various information formats, including text, images, and videos, to provide users with a more natural and richer interactive experience. The SenseNova V6 series includes four versions, with SenseNova V6Pro being the most notable.

Apr 10, 2025

720

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

SenseTime founder Xu Li recently unveiled DayDayUp V6, their latest generation of AI large model, sparking widespread discussion in the tech community. According to AIbase, DayDayUp V6 achieves significant breakthroughs in multimodal capabilities, further solidifying SenseTime's leading position in the AI field. Even more exciting, the model's API will officially open tomorrow, providing developers with stronger technical support and accelerating the implementation of AI applications. Multimodal capabilities are comprehensively upgraded. DayDayUp V6, as SenseTime's...

Apr 10, 2025

450

Alibaba Unveils its First Multimodal Large Model, Qwen2.5-Omni, Challenging Global Tech Giants

On March 27th, Alibaba launched its first multimodal large model, Qwen2.5-Omni-7B. This model boasts powerful capabilities, handling various input modalities such as text, images, audio, and video, and generating text and natural speech outputs in real-time. This innovative technological breakthrough marks another significant advancement for Alibaba in the field of artificial intelligence. In the authoritative OmniBench multimodal fusion task benchmark, Qwen2.5-Omni achieved...

Mar 27, 2025

1.3k

Baidu Releases Ernie 4.5 and X1 Large Models with Significantly Reduced Prices

Baidu recently launched its latest Ernie 4.5 and Ernie X1 large models, both available for free trial on the Ernie Bot official website. Ernie 4.5, Baidu's first native multimodal large model, excels in multimodal understanding and logical reasoning, outperforming GPT-4.5 in various benchmark tests. Its API price is only 1% of GPT-4.5's, attracting significant attention from developers and businesses. Ernie 4.5 demonstrates remarkable advancements in multimodal understanding, showcasing...

Mar 16, 2025

5.1k

Alibaba Launches New Quark, its Flagship AI Application, Integrating Tongyi Series Models

Alibaba has announced the launch of its flagship AI application, New Quark. Built upon Alibaba Tongyi's leading reasoning and multimodal large models, New Quark is a comprehensively upgraded, boundless "AI Super Box." New Quark boasts powerful reasoning capabilities and enables multimodal interaction, allowing users to interact with AI in real-time. The AI will think and act based on user needs, continuously adjusting strategies to ensure task completion. This innovative design provides users with both convenience and an engaging experience.

Mar 13, 2025

1.5k

Alibaba Launches New Quark, its AI Flagship App, with AI Super Box Upgrade

On March 13th, Alibaba officially launched its AI flagship application—New Quark. This newly upgraded Quark, built upon Alibaba Tongyi's leading reasoning and multimodal large models, creates a boundless AI Super Box, offering users a brand-new AI experience.

Mar 13, 2025

930

Google Open-Sources Next-Generation Multimodal Model Gemma-3: Superior Performance, 10x Lower Cost

Google CEO Sundar Pichai announced at a launch event that Google has open-sourced its latest multimodal large model, Gemma-3. This model is attracting significant attention for its low cost and high performance. Gemma-3 offers four different parameter scale options: 1 billion, 4 billion, 12 billion, and 27 billion parameters. Surprisingly, the largest 27 billion parameter model only requires a single H100 GPU for efficient inference, while similar models often require ten times the computing power.

Mar 13, 2025

390

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Best Performance! Step-Video-T2V Video Generation Model from Step-Chronicles

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Jieyue Xingchen and Yuanli Lingji Announce Strategic Partnership

Shanghai AI Laboratory Unveils Upgraded Multimodal Large Model, 'Shusheng · Wanxiang 3.0'

National Supercomputing Platform Releases New Generation Multimodal Large Model to Advance AI Agent Development

SenseTime Unveils New Multimodal Large Model, Shaping the Future of Interaction

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

Alibaba Unveils its First Multimodal Large Model, Qwen2.5-Omni, Challenging Global Tech Giants

Baidu Releases Ernie 4.5 and X1 Large Models with Significantly Reduced Prices

Alibaba Launches New Quark, its Flagship AI Application, Integrating Tongyi Series Models

Alibaba Launches New Quark, its AI Flagship App, with AI Super Box Upgrade

Google Open-Sources Next-Generation Multimodal Model Gemma-3: Superior Performance, 10x Lower Cost