4.5 When the Head is Full

🎯 Core Goals

Understand the trade-offs of endless chats and finite context windows.
Learn the strategies (truncation vs. summarization) used to manage full memory.
This is one of the MANY reasons why your LLM seems quite forgetful.

When the AI’s “receipt tape” (Context Window) runs out of space, something has to be deleted. You either chop off the top of the conversation, or you write a tiny summary to replace it.

👁️ Visuals & Interactives

1. Can You Outperform the LLM?

Remember these details, then solve the puzzle.

Peter's Phone

555-0142

Mary's Address

47 Oak Street

2. How the LLM Manages a Full Head

Imagine you've been chatting over the last few days and the bundle has hit the model's context window limit. What happens next?

Truncation: The oldest messages simply fall off the top and are permanently forgotten.

📝 Key Concepts

Here are some common methods that handle that kind of situation:

Strategy 1: Truncation (Chopping): The system simply deletes your oldest messages. If you told the AI your name in Message 1, and you are now on Message 50, the AI literally no longer has the text where you stated your name. It has been permanently forgotten.
Strategy 2: Summarization: Before the top messages fall off, the system asks the AI to write a quick summary (e.g., “User is named Peter, lives in NY”). It then deletes the old messages and pins the summary to the top. You save space, but you lose the exact nuance of the original messages.
Sliding Window: Most systems use a “sliding window” where they keep the most recent messages and maybe the very first “System Prompt” instructions, letting everything in the middle fall away.

Never use a single chat thread as a permanent “workspace” for months. The context will get poisoned, the top will get truncated, and the AI will become slow and confused. Always start fresh chats for new tasks!