SwiftInfer is an LLM inference acceleration library based on Nvidia TensorRT. It significantly boosts the inference speed of LLMs in production environments by leveraging GPU acceleration. The project implements the Attention Sink mechanism proposed for streaming language models, supporting the generation of infinitely long texts. The code is concise, easy to run, and supports mainstream large language models.