Video Prediction Policy

A general robotic policy for multi-task manipulation based on a video diffusion model.

CommonProductVideoRoboticsVideo Prediction
Video Prediction Policy (VPP) is a robotic strategy based on Video Diffusion Models (VDMs) that accurately predicts future sequences of images, demonstrating a solid understanding of physical dynamics. VPP leverages visual representations from VDMs to reflect the evolution of the physical world, which is known as predictive visual representation. By combining diverse datasets of human or robotic manipulation and employing a unified video generation training objective, VPP outperforms existing methods in two simulated environments and two real-world benchmark tests. Particularly, in the Calvin ABC-D benchmark test, VPP achieved a relative improvement of 28.1% over prior state-of-the-art techniques and increased the success rate in complex real-world manipulation tasks by 28.8%.
Visit

Video Prediction Policy Alternatives