At its OpenDay event, Zhipu AI introduced an AI large model named Ying, capable of generating videos from any text. Ying, based on text-to-video capabilities, can produce high-precision videos with a resolution of 1440x960 in just 30 seconds. Users simply need to input a text prompt, select a desired style such as cartoon 3D, black and white, oil painting, or cinematic, and a video is generated. Ying is now available on the Qingyan App, open to all users.
In addition to text-to-video, Ying also supports image-to-video, offering new ways to create memes, advertisements, story plots, and short videos. Meanwhile, the "Photos Come Alive" mini-program, based on Ying, will also be launched, allowing AI to animate characters or scenes in old photos.
The Ying API has also been launched on the large model open platform bigmodel.cn, allowing enterprises and developers to use the text-to-video and image-to-video model capabilities by calling the API. Ying adopts a new DiT model architecture, more efficiently compressing video information and fully integrating text and video content, enhancing complex instruction compliance, content coherence, and shot composition.
Zhang Peng, CEO of Zhipu AI, mentioned during the event that the video generation model at the base of Ying is CogVideoX, which integrates three dimensions: text, time, and space, referencing the algorithm design of Sora. CogVideoX has six times the inference speed of its predecessor and will introduce higher resolution and longer duration video generation features in the future.
Users can directly experience Ying through the "Ying Intelligent Agent" on Zhipu Qingyan PC/APP, transforming their inspirations into artistic video creations.
Experience the link: https://chatglm.cn/video