Google's Infini-attention technology aims to extend Transformer-based large language models to handle infinitely long inputs. It achieves this by utilizing a compressed memory mechanism and has demonstrated excellent performance on multiple long-sequence tasks. The technique includes a compressed memory mechanism, the combination of local and long-range attention, and streaming capabilities. Experimental results show performance advantages in long-context language modeling, key-context block retrieval, and book summarization tasks.