Sana

Sana

Efficient high-resolution image synthesis framework

0
  • -Deep compression autoencoder: Compared with traditional autoencoders, Sana trained autoencoders can compress images by 32 times, effectively reducing the number of potential labels.
  • -Linear DiT: replaces all traditional attention mechanisms with linear attention, improving efficiency at high resolution without sacrificing quality.
  • -Decoder only text encoder: Using modern decoder only small language models as text encoders, and enhancing image text alignment through complex human instructions and context learning.
  • -Efficient training and sampling: Flow DPM Solver is proposed to reduce sampling steps and accelerate convergence through efficient title labeling and selection.
  • -Competing with modern large-scale diffusion models: Sana-0.6B is comparable in performance to modern large-scale diffusion models such as Flux-12B, with a size 20 times smaller and throughput over 100 times faster.
  • -Laptop GPU deployment: Sana-0.6B can be deployed on 16GB laptop GPU, generating 1024 × 1024 resolution images in less than 1 second.
  • -Open source solutions: Sana is committed to providing fast and open-source AI technology to solve practical challenges.

Product Details

Sana is a text to image framework that efficiently generates images with resolutions up to 4096 × 4096. It synthesizes high-resolution, high-quality images at an extremely fast speed while maintaining strong text image alignment, and can be deployed on laptop GPUs. Sana's core design includes a deep compression autoencoder, a linear diffusion transformer (DiT), a small language model with only a decoder as the text encoder, and efficient training and sampling strategies. Compared to modern large-scale diffusion models, Sana-0.6B is 20 times smaller in size and has a measurement throughput that is over 100 times faster. In addition, Sana-0.6B can be deployed on 16GB laptop GPUs, generating 1024 × 1024 resolution images in less than 1 second. Sana makes low-cost content creation possible.