Video Language Planning

Complex, long-term visual planning

CommonProductVideoVisual PlanningMulti-Modal
Video Language Planning (VLP) is an algorithm that, through training visual language models and text-to-video models, achieves complex, long-term visual planning. VLP takes long-term task instructions and current image observations as input and outputs a detailed multi-modal (video and language) plan describing how to complete the final task. VLP can generate long-term video plans in various robotics domains, from multi-object re-arrangement to multi-camera dual-arm dexterous manipulation. The generated video plans can be converted into real robot actions through goal-conditioned policy. Experiments demonstrate that VLP significantly improves the success rate of long-term tasks compared to previous methods.
Visit

Video Language Planning Visit Over Time

Monthly Visits

275

Bounce Rate

44.27%

Page per Visit

1.0

Visit Duration

00:00:00

Video Language Planning Visit Trend

Video Language Planning Visit Geography

Video Language Planning Traffic Sources

Video Language Planning Alternatives