On December 19th, Google unveiled VideoPoet, a video generation model capable of producing videos up to 10 seconds long and automatically generating accompanying music and sound effects based on the video content. VideoPoet extends the video by repeatedly predicting the next frame after the last frame, giving users the impression that the video can be infinitely extended. Unlike other models, VideoPoet utilizes a large language model rather than a diffusion model, integrating multiple functionalities such as text-to-video, video restoration, and video stylization into a single model, offering greater flexibility in use.