StreamingLLM
An efficient streaming language model with attention downsampling
CommonProductProductivityLanguage ModelNatural Language Processing
StreamingLLM is an efficient language model that can process infinitely long inputs without sacrificing efficiency and performance. It achieves this by retaining the most recent tokens and attention pool while discarding intermediate tokens, allowing the model to generate coherent text from recent tokens without requiring cache resets. StreamingLLM's advantage lies in its ability to generate responses from recent conversations without needing to refresh caches or rely on past data.
StreamingLLM Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29