7.4 Beyond Vector Search — Other Ways to Get Knowledge Into Context

RAG doesn't require a vector database. Keyword search, subagents, and file concatenation each have their place. Know your options before you build.

🎯 Core Goals

  • Clarify that RAG is a pattern (retrieve, augment, generate), not a specific technology.
  • Show the different retrieval methods and when each one fits.
  • Help readers match the right approach to the right situation.

RAG is a three-step pattern: retrieve documents, augment the prompt, generate the answer. The retrieval step can use vector search, keyword search, SQL, or anything else that finds relevant content. Don’t assume you need a vector database — match the retrieval method to the actual problem.

The Lawyer’s Options

Let’s return to Sarah’s 500-case problem. She needs to get the right cases in front of the LLM. Here are the realistic approaches — each with honest trade-offs. Some are retrieval methods you could plug into a RAG pipeline; others skip retrieval entirely.

“RAG” and “vector database” are often used interchangeably — but technically, that’s not entirely correct. RAG describes the pattern: retrieve documents, inject them into the prompt, let the LLM answer from them. The retrieval step can use keyword search, vector search, SQL, or anything else. Vector databases are one popular option — not a requirement.

Approaches to Getting Knowledge Into Context

Each approach has a different character — and a different best use case

🔤
Keyword Search
The classic, zero-setup approach
Instant — works in any doc system
No embedding, no vector DB needed
Binary: word present or absent
Misses synonyms, paraphrasing
Best for: Consistent terminology — also works as a simple RAG retrieval step
📋
File Concatenation
Paste everything in, ask away
Literally zero setup
LLM sees full context, no retrieval errors
Hits context window limit fast
Doesn't scale beyond a few docs
Best for: Tiny datasets — under a dozen short docs
🤖
Subagent Sequential Read
LLM reads each doc and decides
Deep understanding — no retrieval errors
Handles nuance and context well
Slow: reads 500 cases one at a time
Expensive: pays per doc, every query
Best for: Small sets where thoroughness > speed/cost
🔍
RAG
Retrieve, augment, generate — any retrieval method
Flexible — works with any retrieval method
Scales to millions of documents
Vector-based RAG requires setup (embeddings, DB)
Retrieval can miss or mis-rank docs
Best for: Large, diverse knowledge bases at scale

Option 1 — Keyword Search (a valid RAG retrieval method)

The oldest approach: find documents containing specific words.

It’s binary — either the word is present or it isn’t. It has no concept of synonyms, paraphrasing, or context. The 2019 case titled “contractor failed to uphold terms” won’t appear when you search for “breach of contract.”

But keyword search is a legitimate retrieval step inside RAG. You can build a RAG pipeline where the retrieval is just keyword matching — retrieve matching docs, inject them into the prompt, let the LLM answer from them. Same pattern, simpler search engine.

Its character: fast, cheap, zero setup — and surprisingly effective when your organization uses consistent terminology.

Option 2 — Subagent / Sequential Read

An LLM reads through documents one at a time and decides what’s relevant.

This approach is thorough and nuanced — the LLM understands context, not just keywords. But it pays the full cost for every document it touches: 500 cases means 500 reads, every single query.

Its character: deep understanding, no retrieval errors — at the price of being slow and expensive at scale.

Option 3 — File Concatenation

Merge everything into one text block and send it all in as context.

This is the simplest possible approach — no infrastructure whatsoever. But it runs into a hard wall: the context window. A handful of short documents? Fine. Sarah’s 500 cases? Never fits.

Its character: zero setup, zero retrieval errors — for datasets small enough to fit.

Most production RAG systems don’t use just one retrieval method — they combine them. Keyword search narrows the candidate pool first, then vector similarity re-ranks the results, and sometimes a subagent does a final read of the top candidates. This hybrid retrieval is still RAG — the pattern is retrieve, augment, generate. The search engine just happens to use multiple strategies. You get the speed of keyword filtering with the precision of semantic matching.

When you use ChatGPT or Gemini through their web interfaces, you don’t get to choose the retrieval mechanism — you simply use whatever they’ve built in. This understanding becomes important when you start building your own LLM-powered products or integrating an LLM API into your systems.

📝 Key Concepts

  • RAG is a pattern, not a product: Retrieve, augment, generate. The retrieval step can use any method.
  • Keyword search is a valid RAG retrieval method — not an alternative to RAG
  • Hybrid retrieval is still RAG — combining keyword + semantic is the most common production approach
  • Concatenation skips retrieval entirely — it’s genuinely a different approach from RAG
  • Concatenation is underrated for small datasets — don’t over-engineer early
🧠 QUIZ

Which statement about RAG is most accurate?

RAG requires a vector database to store document embeddings
RAG is an alternative to keyword search for finding documents
RAG is a pattern where any retrieval method can feed documents into the LLM's prompt
RAG only works with semantic search powered by embeddings
arrow_back Next arrow_forward