ExllamaV2 is an inference library designed to efficiently run large-scale language models on consumer-grade GPUs. It supports the new tunable quantization format EXL2, achieving a performance improvement of 1.5 to 2 times. The project aims to be an easy-to-use LLM inference solution, compatible with HuggingFace models, and provides interactive examples, allowing seamless experience of the powerful capabilities brought by LLMs. Overall, ExllamaV2 offers a practical way to utilize home GPU resources for running large-scale language models.