Aphrodite is the official backend engine of PygmalionAI, aimed at providing inference endpoints for the PygmalionAI website, enabling fast model serving for a large number of users. It utilizes vLLM's paginated attention technology, achieving features such as continuous batching, efficient key-value management, and optimized CUDA kernels, while supporting various quantization schemes to boost inference performance.