ReDrafter

Innovative technology for accelerating LLM inference on NVIDIA GPUs

CommonProductProductivityNVIDIA GPULLM inference
ReDrafter is a novel predictive decoding method that significantly enhances the inference speed of large language models (LLMs) on NVIDIA GPUs by combining RNN draft models with dynamic tree attention mechanisms. This technology accelerates token generation for LLMs, reducing the latency experienced by users while decreasing GPU usage and energy consumption. Developed by the Apple Machine Learning Research Team in collaboration with NVIDIA, ReDrafter is integrated into the NVIDIA TensorRT-LLM inference acceleration framework, providing machine learning developers using NVIDIA GPUs with faster token generation capabilities.
Visit

ReDrafter Visit Over Time

Monthly Visits

223751

Bounce Rate

67.61%

Page per Visit

1.9

Visit Duration

00:01:00

ReDrafter Visit Trend

ReDrafter Visit Geography

ReDrafter Traffic Sources