The Transformer architecture, a star in the field of artificial intelligence, has led a revolution in natural language processing with its self-attention mechanism at its core. However, when handling long contexts, the resource consumption of self-attention calculations becomes a bottleneck. To address this issue, researchers have proposed the Tree Attention method, which decomposes the calculation tasks through tree reduction to improve efficiency. This method not only reduces communication overhead and memory usage but is also 8 times faster than existing methods in a multi-GPU environment. The introduction of Tree Attention not only enhances performance.