7.4 Beyond Vector Search — Other Ways to Get Knowledge Into Context

🎯 Core Goals

Clarify that RAG is a pattern (retrieve, augment, generate), not a specific technology.
Show the different retrieval methods and when each one fits.
Help readers match the right approach to the right situation.

RAG is a three-step pattern: retrieve documents, augment the prompt, generate the answer. The retrieval step can use vector search, keyword search, SQL, or anything else that finds relevant content. Don’t assume you need a vector database — match the retrieval method to the actual problem.

The Lawyer’s Options

Let’s return to Sarah’s 500-case problem. She needs to get the right cases in front of the LLM. Here are the realistic approaches — each with honest trade-offs. Some are retrieval methods you could plug into a RAG pipeline; others skip retrieval entirely.

“RAG” and “vector database” are often used interchangeably — but technically, that’s not entirely correct. RAG describes the pattern: retrieve documents, inject them into the prompt, let the LLM answer from them. The retrieval step can use keyword search, vector search, SQL, or anything else. Vector databases are one popular option — not a requirement.

Approaches to Getting Knowledge Into Context

Each approach has a different character — and a different best use case

🔤

Keyword Search

The classic, zero-setup approach

✓ Instant — works in any doc system

✓ No embedding, no vector DB needed

✗ Binary: word present or absent

✗ Misses synonyms, paraphrasing

Best for: Consistent terminology — also works as a simple RAG retrieval step

📋

File Concatenation

Paste everything in, ask away

✓ Literally zero setup

✓ LLM sees full context, no retrieval errors

✗ Hits context window limit fast

✗ Doesn't scale beyond a few docs

Best for: Tiny datasets — under a dozen short docs

🤖

Subagent Sequential Read

LLM reads each doc and decides

✓ Deep understanding — no retrieval errors

✓ Handles nuance and context well

✗ Slow: reads 500 cases one at a time

✗ Expensive: pays per doc, every query

Best for: Small sets where thoroughness > speed/cost

🔍

RAG

Retrieve, augment, generate — any retrieval method

✓ Flexible — works with any retrieval method

✓ Scales to millions of documents

✗ Vector-based RAG requires setup (embeddings, DB)

✗ Retrieval can miss or mis-rank docs

Best for: Large, diverse knowledge bases at scale

Option 1 — Keyword Search (a valid RAG retrieval method)

The oldest approach: find documents containing specific words.

It’s binary — either the word is present or it isn’t. It has no concept of synonyms, paraphrasing, or context. The 2019 case titled “contractor failed to uphold terms” won’t appear when you search for “breach of contract.”

But keyword search is a legitimate retrieval step inside RAG. You can build a RAG pipeline where the retrieval is just keyword matching — retrieve matching docs, inject them into the prompt, let the LLM answer from them. Same pattern, simpler search engine.

Its character: fast, cheap, zero setup — and surprisingly effective when your organization uses consistent terminology.

Option 2 — Subagent / Sequential Read

An LLM reads through documents one at a time and decides what’s relevant.

This approach is thorough and nuanced — the LLM understands context, not just keywords. But it pays the full cost for every document it touches: 500 cases means 500 reads, every single query.

Its character: deep understanding, no retrieval errors — at the price of being slow and expensive at scale.

Option 3 — File Concatenation

Merge everything into one text block and send it all in as context.

This is the simplest possible approach — no infrastructure whatsoever. But it runs into a hard wall: the context window. A handful of short documents? Fine. Sarah’s 500 cases? Never fits.

Its character: zero setup, zero retrieval errors — for datasets small enough to fit.

Most production RAG systems don’t use just one retrieval method — they combine them. Keyword search narrows the candidate pool first, then vector similarity re-ranks the results, and sometimes a subagent does a final read of the top candidates. This hybrid retrieval is still RAG — the pattern is retrieve, augment, generate. The search engine just happens to use multiple strategies. You get the speed of keyword filtering with the precision of semantic matching.

When you use ChatGPT or Gemini through their web interfaces, you don’t get to choose the retrieval mechanism — you simply use whatever they’ve built in. This understanding becomes important when you start building your own LLM-powered products or integrating an LLM API into your systems.

📝 Key Concepts

RAG is a pattern, not a product: Retrieve, augment, generate. The retrieval step can use any method.
Keyword search is a valid RAG retrieval method — not an alternative to RAG
Hybrid retrieval is still RAG — combining keyword + semantic is the most common production approach
Concatenation skips retrieval entirely — it’s genuinely a different approach from RAG
Concatenation is underrated for small datasets — don’t over-engineer early

🧠 QUIZ

Which statement about RAG is most accurate?

RAG requires a vector database to store document embeddings

RAG is an alternative to keyword search for finding documents

RAG is a pattern where any retrieval method can feed documents into the LLM's prompt