Aphrodite-engine

Aphrodite-engine

Pygmalion AI's large-scale inference engine

  • Continuous batch processing to improve model inference efficiency
  • Using vLLM's paging attention technology to optimize key value management
  • CUDA cores optimized for different GPUs to improve inference speed
  • Support multiple quantization schemes, such as AQLM, AWQ, etc., to adapt to different hardware
  • Distributed reasoning capability, supporting large-scale user access
  • Provide multiple sampling methods, such as Mirostat, Locally Typical Sampling, etc
  • 8-bit KV cache, supporting longer context length and throughput

Product Details

Aphrodite is the official backend engine of Pygmalion AI, designed to provide inference endpoints for Pygmalion AI websites and allow Pygmalion model services to be provided to a large number of users at an extremely fast speed. Aphrodite utilizes vLLM's paging attention technology to achieve features such as continuous batch processing, efficient key value management, and optimized CUDA kernel. It supports multiple quantization schemes to improve inference performance.