Gemma-2B-10M
The Gemma 2B model supports 10M sequence length, optimizes memory usage, and is suitable for large-scale language model applications.
CommonProductProgrammingLanguage ModelAttention Mechanism
The Gemma 2B - 10M Context is a large-scale language model that, through innovative attention mechanism optimization, can process sequences up to 10M long with memory usage less than 32GB. The model employs recurrent localized attention technology, inspired by the Transformer-XL paper, making it a powerful tool for handling large-scale language tasks.
Gemma-2B-10M Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57