Video Prediction Policy

A Multi task Agile Hand Controlled Universal Robot Strategy Based on Video Diffusion Model

-Multi task dexterous hand control: VPP supports multiple tasks, such as placing, cup upright, repositioning, stacking, transferring, pressing, unplugging, opening, etc.
-Video Diffusion Models (VDMs): VPP is based on video diffusion models, which can predict future image sequences and understand physical dynamics.
-Predictive Visual Representation: VPP utilizes visual representations in VDMs to reflect the evolution of the physical world.
-Unified video generation training objective: By combining diverse datasets, VPP can improve the quality of predicted visual representations.
-Simulation environment and real-world testing: VPP has been extensively tested in simulation environments such as CALVIN benchmark and MetaWorld benchmark, as well as real-world tasks such as Panda arm manipulation and XHand dexterous hand manipulation.
-Relative improvement and success rate increase: In the Calvin ABC-D benchmark test, VPP achieved a relative improvement of 28.1% and a success rate increase of 28.8% in complex tasks.
-Single universal strategy: VPP uses a single universal strategy to execute diverse tasks through different instructions.

Product Details

Video Prediction Policy (VPP) is a robot strategy based on Video Diffusion Models (VDMs) that can accurately predict future image sequences and demonstrate a good understanding of physical dynamics. VPP utilizes visual representations in VDMs to reflect the evolution of the physical world, and this representation is called predictive visual representation. By combining diverse datasets of human or robot manipulation and using a unified video to generate training objectives, VPP outperforms existing methods in both simulated environments and two real-world benchmark tests. Especially in the Calvin ABC-D benchmark test, VPP achieved a relative improvement of 28.1% compared to previous state-of-the-art techniques and increased the success rate by 28.8% in complex real-world dexterous hand manipulation tasks.

Video Prediction Policy

Product Details

Related Projects

Qingying AI Video Generation Service

Viral Insight

ComfyUI-LivePortraitKJ

Open-Sora Plan v1.2