Phenaki is a model that can generate realistic videos based on a series of text prompts. It learns video representation by compressing videos into discrete token representations. The model utilizes temporal causal attention to generate video tokens and conditionally generates videos based on pre-computed text tokens. Compared to previous video generation methods, Phenaki can generate videos of arbitrary length based on a series of prompts, such as time-varying text or stories. It is positioned to generate videos in open domains and boasts generalization capabilities exceeding the scope of existing video datasets. To better cater to user needs, Phenaki also provides interactive examples and other application scenarios.