BiTA
Bidirectional Adjustment for Large Language Models
CommonProductProductivityLarge Language ModelsPlugin
BiTA is a bidirectional adjustment method for large language models. It accelerates large language models through simplified self-regressive generation and draft candidate verification. As a lightweight plug-in module, BiTA can seamlessly enhance the inference efficiency of existing large language models without requiring additional auxiliary models or incurring significant additional memory costs. After applying BiTA, LLaMA-2-70B-Chat achieved a 2.7x speedup on the MT-Bench benchmark. Extensive experiments have shown that our method outperforms the state-of-the-art acceleration techniques.
BiTA Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32