In recent years, BiTA has accelerated the generation of large language models (LLMs) through bi-directional tuning and tree-structured decoding technology innovations. Adopting a universal architecture with a pluggable design, it is particularly suitable for real-time applications such as chatbots. Through bi-directional tuning and SAR draft verification, it has achieved lossless acceleration for autoregressive language models. Research has found that in extensive generation task tests, BiTA has delivered an impressive acceleration effect of 2.1× to 3.3×. Its adjustable prompt design makes it a plug-and-play method, applicable to any transformer-based publicly accessible LLMs.