2.2 ELI5 Attention — The Cocktail Party Effect

🎯 Core Goals

Teach that “attention” means every word looks at every other word.
Show that some connections matter much more than others.

“Attention” is the mechanism LLMs use to understand context. Imagine a noisy cocktail party: if someone across the room says your name, you suddenly focus on their voice and filter out the noise. LLMs do the same with words!

👁️ Visuals & Interactives

👀

Click any word to reveal its attention

Every word pays attention to every other word — with different strengths

I am an interesting person

← Click any word to see its connections

📝 Key Concepts

Looking Everywhere: When processing a sentence, the LLM mathematically compares the first word to the second word, the first to the third, the second to the third, and so on. Every word connects to EVERY other word.
Finding the Clues: Thick connections (high attention scores) form when words give context to each other. In a noisy sentence, “bank” pays high attention to “river” to know what kind of bank it is. Unrelated words get very thin connections.
Pronoun Resolution: Attention is how an LLM figures out what “it”, “he”, or “she” refers to. The pronoun will have a massive attention score pointing back to the noun it represents.

The Paper That Changed Everything The paper “Attention Is All You Need” by Vaswani et al. (2017) introduced the Transformer architecture—the foundation of every modern LLM. Before Transformers, AI processed words one at a time. Transformers process ALL words simultaneously, using attention to figure out which words matter to each other.

Attention isn’t human understanding. It’s just a mathematical score of how often words appear together in similar contexts during the LLM’s training.

🧠 QUIZ

What does "attention" mean in the context of LLMs?

The LLM carefully reads your message word by word, like a human would

Every word mathematically compares itself to every other word to determine which connections matter most

The LLM focuses only on the most recent sentence and ignores earlier context