4.1 Counting Problems — Pattern Matching, Not Math

🎯 Core Goals

Understand why LLMs struggle with basic spelling and counting tasks.
Connect this limitation back to tokenization.

Because LLMs read in “chunks” (tokens) rather than individual letters, asking an LLM to count the letters in a word is like asking you to count the individual threads in a sweater while standing 10 feet away.

👁️ Visuals & Interactives

The "Strawberry" Test

Why LLMs struggle to count letters

The Question:

"How many 'r's are in strawberry?"

What the LLM sees (Tokens):

straw berry

It doesn't see the individual 'r's inside the chunks!

🤖 AI Response:

"There are 2 'r's in strawberry."

Reality check: s-t-r-a-w-b-e-r-r-y (There are 3!)

📝 Key Concepts

The Strawberry Problem: If you ask an LLM “How many ‘r’s are in strawberry?”, it often says 2. Why? Because it sees the word as two tokens: “straw” and “berry”. It doesn’t analyze the individual letters. For us humans, we can break down “straw” letter by letter: s, t, r, a, w. But for an LLM, “straw” is its own Lego block — it can’t be broken further.
Math vs. Language: LLMs are calculators of language, not math. When they do math correctly, it’s often because they’ve seen that exact math problem (like 2+2=4) millions of times in their training data, not because they are actually computing it.
Pattern Matching: If you ask an LLM to add long strings of numbers, it might guess a plausible-sounding sum based on patterns it’s seen before, rather than performing actual addition.

Never trust an LLM to count the exact number of words, characters, or sentences in a paragraph. It is physically incapable of seeing the text at that granular level without specialized tools!

🧠 QUIZ

Why can't LLMs reliably count the letters in a word?

They haven't been trained on enough counting examples

They process text as token chunks, not individual characters

Counting is too complex for any AI system