SwiftInfer
A large-scale language model (LLM) inference acceleration library based on the TensorRT framework, significantly improving LLM inference performance in production environments through GPU acceleration.
CommonProductProgrammingTensorRTIntelligent Chat
SwiftInfer is an LLM inference acceleration library based on Nvidia TensorRT. It significantly boosts the inference speed of LLMs in production environments by leveraging GPU acceleration. The project implements the Attention Sink mechanism proposed for streaming language models, supporting the generation of infinitely long texts. The code is concise, easy to run, and supports mainstream large language models.
SwiftInfer Visit Over Time
Monthly Visits
488643166
Bounce Rate
37.28%
Page per Visit
5.7
Visit Duration
00:06:37