Insights from Paper by Google on Infinite Context Length

2 min readApr 24, 2024

Paper Link: https://arxiv.org/abs/2404.07143 (Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention )

Below are my insights from the above paper in brief for anyone to understand.

Infini-attention is a new attention mechanism variant proposed by Google that promises the ability to scale input sequences to infinite lengths for large language models (LLMs).
It introduces the concept of “compressive memory” — a compressed summary of past input that the model can attend to, in addition to the most recent context window.
To access the compressive memory, Infini-attention uses linear attention (where instead of performing N-squared computations, the model only has access to a heavily compressed summary of the past), which is more computationally efficient than standard dot-product attention used for the recent context window.
This allows Infini-attention to avoid forgetting past information like standard LLMs, while keeping computational costs manageable even for very long input sequences.
The compressive memory is updated recurrently as the model processes new input segments, similar to how humans continuously update their understanding while reading a book.
Infini-attention is likely the breakthrough that enabled Google to release Gemini 1.5 with a 1 million token context window, a 10x increase from the earlier Gemini 1.0.
Infini-attention, just like its traditional counterpart, has recency bias as its main inductive bias. As the model has full access to recent words but only a summary of the past, this architecture assumes that recent context is more relevant than the past to model language.
While compressive memory means some information loss, Infini-attention presents a promising step towards “eternal” transformers that can scale to much longer contexts.

If you liked the article, give it a few claps, & Follow me on Medium for more.
Let’s connect on Twitter. More About me here.

Insights from Paper by Google on Infinite Context Length

Written by Aman Kumar