Open-Sora Plan v1.2

Advanced Model Architecture in the Field of Text to Video Generation

Using a 2+1D model architecture to quickly generate text to video tasks
Optimize the CausalVideoVAE structure to provide better compressed visual representation and inference efficiency
Using 3D full attention architecture to enhance understanding of the world
Open source release, including code, data, and models, promotes community development
Train on the Kinetic400 video dataset and fine tune using EMA weights
Evaluate using metrics such as PSNR, SSIM, and LPIPS to ensure video quality

Product Details

Open Sora Plan v1.2 is an open-source video generation model that focuses on text to video conversion tasks. It adopts a 3D full attention architecture, optimizes the visual representation of videos, and improves inference efficiency. This model is innovative in the field of video generation and can better capture joint spatiotemporal features, providing a new technological path for automatic generation of video content.

Open-Sora Plan v1.2

Product Details

Related Projects

Qingying AI Video Generation Service

Viral Insight

ComfyUI-LivePortraitKJ

SV4D