The data to be translated: Generative large language models are renowned for their exceptional performance across a variety of tasks, including complex natural language processing, creative writing, question-answering, and code generation. LLMs have been running on user-friendly local systems, including home PCs equipped with consumer-grade GPUs. It is understood that PowerInfer is a GPU-CPU hybrid inference engine that leverages this understanding, preloading cold-activated neurons onto the CPU for computation and hot-activated neurons onto the GPU for immediate access. Evaluations have shown that PowerInfer is also 11.69 times faster than the current llama.cpp system while maintaining model fidelity. In summary, PowerInfer significantly enhances LLM inference speed, indicating its potential for execution on desktop computers with limited GPU capabilities.