Aphrodite-engine

Pygmalion AI's large-scale inference engine

Continuous batch processing to improve model inference efficiency
Using vLLM's paging attention technology to optimize key value management
CUDA cores optimized for different GPUs to improve inference speed
Support multiple quantization schemes, such as AQLM, AWQ, etc., to adapt to different hardware
Distributed reasoning capability, supporting large-scale user access
Provide multiple sampling methods, such as Mirostat, Locally Typical Sampling, etc
8-bit KV cache, supporting longer context length and throughput

Product Details

Aphrodite is the official backend engine of Pygmalion AI, designed to provide inference endpoints for Pygmalion AI websites and allow Pygmalion model services to be provided to a large number of users at an extremely fast speed. Aphrodite utilizes vLLM's paging attention technology to achieve features such as continuous batch processing, efficient key value management, and optimized CUDA kernel. It supports multiple quantization schemes to improve inference performance.

Aphrodite-engine

Product Details

Related Projects

Understood zKnown

MBox AI Meet

Klee

Kerqu.Ai