Why RAG Needs Multiple Retrieval Layers
/ 8 min read
Table of Contents
Part 1: Beyond Vector Search - Why RAG Needs Multiple Retrieval Layers
This is Part 1 of a 4-part series on building production-ready, multi-layer RAG systems with Ragforge.
- Part 1: Beyond Vector Search (you are here)
- Part 2: The Multi-Hop Retrieval Pipeline
- Part 3: Microservices Architecture
- Part 4: From Development to Production
Introduction: The State of RAG in 2025
Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications that need to answer questions based on private or specialized knowledge. The basic recipe is simple:
- Store your documents as vector embeddings
- Find semantically similar chunks for a user’s query
- Pass those chunks to an LLM
- Generate an answer
This approach works remarkably well for straightforward factual lookups. Ask “What is photosynthesis?” and a vector search will find relevant biology textbook passages. The LLM synthesizes a clear answer.
But what happens when queries become more complex?
The Problem: When Vector Search Isn’t Enough
Consider this question about the Mahabharata epic:
“Who helped Arjuna after Day 10 of the war?”
A traditional RAG system using only vector search would:
- Embed the question into a vector
- Find the most semantically similar document chunks
- Pass those chunks to the LLM
- Hope the answer is in there
Why This Fails
This question requires understanding:
- Temporal relationships: Events happening after a specific day
- Social connections: Who was allied with whom
- Causal chains: X happened because Y influenced Z
- Multi-step reasoning: Finding information across multiple related documents
Vector search struggles here because:
-
Semantic similarity ≠ Logical relevance
- The most similar passages might mention “Arjuna” and “Day 10” but not discuss who helped him afterward
- A passage about Arjuna’s allies from Day 5 is semantically similar but temporally wrong
-
Timeline information is often implicit
- “The next day, Krishna advised…” - What day was that?
- Vector embeddings don’t capture temporal ordering well
-
Relationship context is distributed
- Arjuna’s allies are mentioned in one document
- Battle timelines are in another
- Specific aid after Day 10 is in a third
- No single chunk contains the complete answer
-
Causal chains require multi-hop reasoning
- Krishna helped because he was Arjuna’s charioteer
- Bhima helped because he was Arjuna’s brother
- This “because” reasoning isn’t captured by cosine similarity
A Real Example of Vector Search Failure
Let’s visualize what happens:
Query: "Who helped Arjuna after Day 10 of the war?"Query Embedding: [0.23, -0.41, 0.89, ...]
Top 3 Results by Cosine Similarity:─────────────────────────────────────────────────────Rank 1 (score: 0.92)"Arjuna was one of the greatest warriors in the Kurukshetra war.On Day 10, he fought valiantly against the Kaurava forces..."
Problem: Mentions Arjuna and Day 10, but says nothing about who helped him afterward─────────────────────────────────────────────────────Rank 2 (score: 0.89)"Krishna served as Arjuna's charioteer throughout the war,providing strategic guidance and moral support..."
Problem: Mentions help but no temporal marker for "after Day 10"─────────────────────────────────────────────────────Rank 3 (score: 0.87)"After Day 14, when Arjuna was exhausted, his brother Bhimastepped forward to protect the Pandava position..."
Closer! Has temporal marker and helper, but it's Day 14, not Day 10─────────────────────────────────────────────────────The right answer might be buried at rank 47 because it mentions different names or uses different phrasing.
What We Actually Need
To answer complex questions, we need a system that can:
1. Understand Entity Relationships
Arjuna -[ALLY_OF]-> KrishnaArjuna -[BROTHER_OF]-> BhimaArjuna -[PARTICIPATED_IN]-> Kurukshetra WarWhen we search for “Arjuna,” the system should know to also look at Krishna and Bhima’s activities.
2. Reason About Time and Sequence
Day 10: [Event A, Event B]Day 11: [Event C] ← Search here for "after Day 10"Day 12: [Event D]The system needs to filter results based on temporal constraints, not just semantic similarity.
3. Traverse Multiple Hops of Information
Hop 1: Find documents about Arjuna (semantic search) ↓Hop 2: Expand to his known allies (graph traversal) ↓Hop 3: Filter for events after Day 10 (temporal filter) ↓Result: "Krishna continued as charioteer, Bhima protected flanks"4. Combine Multiple Evidence Sources
- Semantic: “Arjuna needed support”
- Symbolic: Krishna is linked to Arjuna via ALLY_OF relationship
- Temporal: Document has metadata:
{day: 11, after_day_10: true} - Synthesis: All three signals point to the right answer
Introducing Multi-Layer RAG
What if we could combine three complementary approaches?
The Three Pillars
┌─────────────────────────────────────────────────────────┐│ MULTI-LAYER RAG │├─────────────────────────────────────────────────────────┤│ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ SEMANTIC │ │ SYMBOLIC │ │ MULTI-HOP │ ││ │ │ │ │ │ │ ││ │ Vector │ │ Knowledge │ │ Iterative │ ││ │ Embeddings │ │ Graph │ │ Context │ ││ │ │ │ │ │ Expansion │ ││ │ pgvector │ │ Neo4j │ │ │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────────────────┘1. Semantic Layer (Dense Retrieval)
- What: Traditional vector search using embeddings
- When: First-hop recall, finding semantically similar content
- How: Sentence transformers → pgvector → cosine similarity
- Strength: Great for finding passages about the same topic
2. Symbolic Layer (Graph Reasoning)
- What: Knowledge graph with entities and relationships
- When: Understanding connections, hierarchies, and constraints
- How: Entity extraction → graph database → traversal queries
- Strength: Captures relationships that aren’t semantically similar
3. Multi-Hop Layer (Iterative Expansion)
- What: Progressively expand context through multiple retrieval rounds
- When: Answer requires information from multiple sources
- How: Hop 1 (vector) → Hop 2 (graph) → Hop 3 (exact match)
- Strength: Finds distributed information and synthesizes context
The Ragforge Approach
Ragforge is a production-ready implementation of multi-layer RAG that orchestrates all three layers intelligently. It’s a production grade RAG microservices system that’s easy to deploy and operate.
High-Level Architecture
┌─────────────┐ │ Query │ └──────┬──────┘ │ ▼ ┌────────────────────────┐ │ Query Understanding │ │ (Parse, Extract, │ │ Classify Intent) │ └────────────┬───────────┘ │ ┌────────────────┼────────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────┐ ┌───────────┐ ┌──────────┐ │ Semantic │ │ Ontology │ │ Temporal │ │Expansion │ │Expansion │ │ Filtering│ └─────┬────┘ └─────┬─────┘ └────┬─────┘ │ │ │ │ │ │ └───────────────┼────────────────┘ │ ▼ ┌──────────────────┐ │ Multi-Hop │ │ Retrieval │ │ │ │ Hop 1: Vector │ │ Hop 2: Graph │ │ Hop 3: Exact │ └─────────┬────────┘ │ ▼ ┌──────────────────┐ │ Fusion & │ │ Reranking │ └─────────┬────────┘ │ ▼ ┌──────────────────┐ │ LLM Generation │ │ + Citations │ └──────────────────┘Key Components
1. Data Pipeline Layer
Transforms raw text into multiple representations:
- Document chunks for vector search (300-500 tokens)
- Entity extractions (characters, locations, events)
- Structured triples for the graph (subject-relation-object)
- Metadata tags (timeline markers, entity types)
2. Ontology / Knowledge Graph Layer
Maintains relationships:
- Nodes: Characters, Houses, Locations, Events, Battles
- Edges: ALLY_OF, PARTICIPATED_IN, LOCATED_AT, HAPPENED_AFTER
- Capabilities: Multi-hop traversal, relationship inference
3. Embeddings & Vector Store Layer
Dense semantic search:
- Sentence-level embeddings
- pgvector for ANN search
- Metadata filtering
- Hybrid search (vector + SQL)
4. Retrieval Orchestrator (The Brain)
Coordinates all layers:
- Query parsing and intent detection
- Semantic and ontology expansion
- Multi-hop retrieval workflow
- Fusion and reranking
- Prompt construction
5. LLM Reasoning & Generation Layer
Grounded answer generation:
- Context injection
- Citation tracking
- Reasoning traces
- Safety filtering
6. UI & API Gateway Layer
User interaction:
- REST API
- Interactive UI
- Debug views
- Visualization panels
When Do You Need Multi-Layer RAG?
Not every application needs this complexity. Here’s a decision framework:
✅ You NEED Multi-Layer RAG When:
- Complex relationships matter: “Who influenced whom?”
- Temporal reasoning is required: “What happened after X?”
- Causal chains are important: “Why did X lead to Y?”
- Multi-document synthesis is needed: Answer spans multiple sources
- Domain has structured knowledge: Medical ontologies, legal precedents, enterprise taxonomies
- Accuracy is critical: Cost of wrong answers is high
Examples: Medical diagnosis support, legal case analysis, historical research, enterprise knowledge bases, technical troubleshooting
❌ You DON’T Need Multi-Layer RAG When:
- Simple factual lookups: “What is X?”
- Single-document answers: Everything is in one place
- No relationships matter: Documents are independent
- Speed over accuracy: Quick responses more important than perfect answers
- No graph structure: Your domain doesn’t have clear entities and relationships
Examples: FAQ chatbots, simple documentation search, product catalog search, blog content retrieval
Cost/Benefit Analysis
┌─────────────────────────────────────────────────────────┐│ Complexity vs. Value │├─────────────────────────────────────────────────────────┤│ ││ High │ ● Multi-Layer RAG ││ V │ (High complexity, ││ a │ high value) ││ l │ ││ u │ ● Hybrid RAG ││ e │ (Medium complexity, ││ │ medium value) ││ │ ││ │ ● Simple Vector RAG ││ Low │ (Low complexity, low value) ││ └──────────────────────────────────────────── ││ Low Medium High ││ Complexity │└─────────────────────────────────────────────────────────┘What’s Coming Next
In Part 2, we’ll dive deep into the retrieval engine and show you exactly how multi-hop retrieval works:
- Complete query flow walkthrough with the Arjuna example
- How semantic expansion works (with code)
- How ontology expansion leverages graph traversal
- The multi-hop retrieval algorithm step-by-step
- Fusion and reranking strategies
- Performance characteristics and benchmarks
You’ll see the actual queries, graph traversals, and fusion logic that makes multi-layer RAG work.
Key Takeaways
-
Vector search alone is insufficient for complex queries requiring relationships, temporal reasoning, or causal chains
-
Multi-layer RAG combines three approaches: semantic (vector), symbolic (graph), and multi-hop (iterative expansion)
-
Ragforge implements this pattern as a production-ready, scalable system
-
Not every application needs this complexity - use the decision framework to determine if multi-layer RAG is right for you
-
The payoff is accuracy and explainability - you can trace exactly where answers come from and why they’re relevant
Continue the Series
Part 1 (you are here) → Part 2: The Multi-Hop Retrieval Pipeline
Coming Up Next: We’ll trace a complete query from start to finish, showing you exactly how the retrieval orchestrator coordinates semantic search, graph traversal, and multi-hop expansion to answer complex questions.