🎯 Core Goals
- Understand the trade-offs of endless chats and finite context windows.
- Learn the strategies (truncation vs. summarization) used to manage full memory.
- This is one of the MANY reasons why your LLM seems quite forgetful.
When the AI’s “receipt tape” (Context Window) runs out of space, something has to be deleted. You either chop off the top of the conversation, or you write a tiny summary to replace it.
👁️ Visuals & Interactives
1. Can You Outperform the LLM?
Remember these details, then solve the puzzle.
Pick the one you remember…
Most humans mix up the digits! Distraction pushes earlier memories out — just like it does to an LLM's context window.
2. How the LLM Manages a Full Head
Imagine you've been chatting over the last few days and the bundle has hit the model's context window limit. What happens next?
Truncation: The oldest messages simply fall off the top and are permanently forgotten.
📝 Key Concepts
Here are some common methods that handle that kind of situation:
- Strategy 1: Truncation (Chopping): The system simply deletes your oldest messages. If you told the AI your name in Message 1, and you are now on Message 50, the AI literally no longer has the text where you stated your name. It has been permanently forgotten.
- Strategy 2: Summarization: Before the top messages fall off, the system asks the AI to write a quick summary (e.g., “User is named Peter, lives in NY”). It then deletes the old messages and pins the summary to the top. You save space, but you lose the exact nuance of the original messages.
- Sliding Window: Most systems use a “sliding window” where they keep the most recent messages and maybe the very first “System Prompt” instructions, letting everything in the middle fall away.
Never use a single chat thread as a permanent “workspace” for months. The context will get poisoned, the top will get truncated, and the AI will become slow and confused. Always start fresh chats for new tasks!