
Open-Sora Plan v1.2
Advanced Model Architecture in the Field of Text to Video Generation
- Using a 2+1D model architecture to quickly generate text to video tasks
- Optimize the CausalVideoVAE structure to provide better compressed visual representation and inference efficiency
- Using 3D full attention architecture to enhance understanding of the world
- Open source release, including code, data, and models, promotes community development
- Train on the Kinetic400 video dataset and fine tune using EMA weights
- Evaluate using metrics such as PSNR, SSIM, and LPIPS to ensure video quality
Product Details
Open Sora Plan v1.2 is an open-source video generation model that focuses on text to video conversion tasks. It adopts a 3D full attention architecture, optimizes the visual representation of videos, and improves inference efficiency. This model is innovative in the field of video generation and can better capture joint spatiotemporal features, providing a new technological path for automatic generation of video content.