2.4 Context Capacity — What Fits in the Head

🎯 Core Goals

Make the concept of a “Context Window” concrete.
Visualize what 200k or 1 Million tokens actually looks like.

The “Context Window” is the absolute limit of how much text an LLM can pay attention to at one time. It’s the AI’s only “short-term memory.” Once it fills up, old information has to fall out.

👁️ Visuals & Interactives

Context Capacity

How many "tokens" can the model hold in its head at once?

📄

2,000 Tokens

Early LLMs: Equivalent to about 4 pages of text. Perfect for a quick email or a short article.

📝 Key Concepts

The Limit of Attention: The math of calculating attention between every single word gets incredibly expensive as text gets longer. This is why LLMs have a hard limit on their context window.
Short-Term Memory: The context window is an LLM’s only “memory”. If you had a 50-page conversation, but the context window only fits 40 pages, the LLM will completely forget the first 10 pages.
Growing Capabilities: Early LLMs could only remember about 2,000 tokens (a few pages). Today’s models can remember 200,000 tokens (about 2 full novels) up to 1,000,000 tokens (roughly 10 novels).

Just because a model has a 1 Million token context window doesn’t mean it pays perfect attention to everything in it. Often, LLMs suffer from the “needle in a haystack” problem, where they forget things in the middle of long text.

🧠 QUIZ

What happens when a conversation exceeds the LLM's context window?

The LLM compresses older messages to make room

The earliest parts of the conversation are effectively forgotten

The LLM asks you to start a new conversation