ReDrafter
Innovative technology for accelerating LLM inference on NVIDIA GPUs
CommonProductProductivityNVIDIA GPULLM inference
ReDrafter is a novel predictive decoding method that significantly enhances the inference speed of large language models (LLMs) on NVIDIA GPUs by combining RNN draft models with dynamic tree attention mechanisms. This technology accelerates token generation for LLMs, reducing the latency experienced by users while decreasing GPU usage and energy consumption. Developed by the Apple Machine Learning Research Team in collaboration with NVIDIA, ReDrafter is integrated into the NVIDIA TensorRT-LLM inference acceleration framework, providing machine learning developers using NVIDIA GPUs with faster token generation capabilities.
ReDrafter Visit Over Time
Monthly Visits
223751
Bounce Rate
67.61%
Page per Visit
1.9
Visit Duration
00:01:00