skip to content

Part 1: Beyond Vector Search - Why RAG Needs Multiple Retrieval Layers

This is Part 1 of a 4-part series on building production-ready, multi-layer RAG systems with Ragforge.


Introduction: The State of RAG in 2025

Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications that need to answer questions based on private or specialized knowledge. The basic recipe is simple:

  1. Store your documents as vector embeddings
  2. Find semantically similar chunks for a user’s query
  3. Pass those chunks to an LLM
  4. Generate an answer

This approach works remarkably well for straightforward factual lookups. Ask “What is photosynthesis?” and a vector search will find relevant biology textbook passages. The LLM synthesizes a clear answer.

But what happens when queries become more complex?


The Problem: When Vector Search Isn’t Enough

Consider this question about the Mahabharata epic:

“Who helped Arjuna after Day 10 of the war?”

A traditional RAG system using only vector search would:

  1. Embed the question into a vector
  2. Find the most semantically similar document chunks
  3. Pass those chunks to the LLM
  4. Hope the answer is in there

Why This Fails

This question requires understanding:

  • Temporal relationships: Events happening after a specific day
  • Social connections: Who was allied with whom
  • Causal chains: X happened because Y influenced Z
  • Multi-step reasoning: Finding information across multiple related documents

Vector search struggles here because:

  1. Semantic similarity ≠ Logical relevance

    • The most similar passages might mention “Arjuna” and “Day 10” but not discuss who helped him afterward
    • A passage about Arjuna’s allies from Day 5 is semantically similar but temporally wrong
  2. Timeline information is often implicit

    • “The next day, Krishna advised…” - What day was that?
    • Vector embeddings don’t capture temporal ordering well
  3. Relationship context is distributed

    • Arjuna’s allies are mentioned in one document
    • Battle timelines are in another
    • Specific aid after Day 10 is in a third
    • No single chunk contains the complete answer
  4. Causal chains require multi-hop reasoning

    • Krishna helped because he was Arjuna’s charioteer
    • Bhima helped because he was Arjuna’s brother
    • This “because” reasoning isn’t captured by cosine similarity

A Real Example of Vector Search Failure

Let’s visualize what happens:

Query: "Who helped Arjuna after Day 10 of the war?"
Query Embedding: [0.23, -0.41, 0.89, ...]
Top 3 Results by Cosine Similarity:
─────────────────────────────────────────────────────
Rank 1 (score: 0.92)
"Arjuna was one of the greatest warriors in the Kurukshetra war.
On Day 10, he fought valiantly against the Kaurava forces..."
Problem: Mentions Arjuna and Day 10, but says nothing about who helped him afterward
─────────────────────────────────────────────────────
Rank 2 (score: 0.89)
"Krishna served as Arjuna's charioteer throughout the war,
providing strategic guidance and moral support..."
Problem: Mentions help but no temporal marker for "after Day 10"
─────────────────────────────────────────────────────
Rank 3 (score: 0.87)
"After Day 14, when Arjuna was exhausted, his brother Bhima
stepped forward to protect the Pandava position..."
Closer! Has temporal marker and helper, but it's Day 14, not Day 10
─────────────────────────────────────────────────────

The right answer might be buried at rank 47 because it mentions different names or uses different phrasing.


What We Actually Need

To answer complex questions, we need a system that can:

1. Understand Entity Relationships

Arjuna -[ALLY_OF]-> Krishna
Arjuna -[BROTHER_OF]-> Bhima
Arjuna -[PARTICIPATED_IN]-> Kurukshetra War

When we search for “Arjuna,” the system should know to also look at Krishna and Bhima’s activities.

2. Reason About Time and Sequence

Day 10: [Event A, Event B]
Day 11: [Event C] ← Search here for "after Day 10"
Day 12: [Event D]

The system needs to filter results based on temporal constraints, not just semantic similarity.

3. Traverse Multiple Hops of Information

Hop 1: Find documents about Arjuna (semantic search)
Hop 2: Expand to his known allies (graph traversal)
Hop 3: Filter for events after Day 10 (temporal filter)
Result: "Krishna continued as charioteer, Bhima protected flanks"

4. Combine Multiple Evidence Sources

  • Semantic: “Arjuna needed support”
  • Symbolic: Krishna is linked to Arjuna via ALLY_OF relationship
  • Temporal: Document has metadata: {day: 11, after_day_10: true}
  • Synthesis: All three signals point to the right answer

Introducing Multi-Layer RAG

What if we could combine three complementary approaches?

The Three Pillars

┌─────────────────────────────────────────────────────────┐
│ MULTI-LAYER RAG │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SEMANTIC │ │ SYMBOLIC │ │ MULTI-HOP │ │
│ │ │ │ │ │ │ │
│ │ Vector │ │ Knowledge │ │ Iterative │ │
│ │ Embeddings │ │ Graph │ │ Context │ │
│ │ │ │ │ │ Expansion │ │
│ │ pgvector │ │ Neo4j │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘

1. Semantic Layer (Dense Retrieval)

  • What: Traditional vector search using embeddings
  • When: First-hop recall, finding semantically similar content
  • How: Sentence transformers → pgvector → cosine similarity
  • Strength: Great for finding passages about the same topic

2. Symbolic Layer (Graph Reasoning)

  • What: Knowledge graph with entities and relationships
  • When: Understanding connections, hierarchies, and constraints
  • How: Entity extraction → graph database → traversal queries
  • Strength: Captures relationships that aren’t semantically similar

3. Multi-Hop Layer (Iterative Expansion)

  • What: Progressively expand context through multiple retrieval rounds
  • When: Answer requires information from multiple sources
  • How: Hop 1 (vector) → Hop 2 (graph) → Hop 3 (exact match)
  • Strength: Finds distributed information and synthesizes context

The Ragforge Approach

Ragforge is a production-ready implementation of multi-layer RAG that orchestrates all three layers intelligently. It’s a production grade RAG microservices system that’s easy to deploy and operate.

High-Level Architecture

┌─────────────┐
│ Query │
└──────┬──────┘
┌────────────────────────┐
│ Query Understanding │
│ (Parse, Extract, │
│ Classify Intent) │
└────────────┬───────────┘
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌───────────┐ ┌──────────┐
│ Semantic │ │ Ontology │ │ Temporal │
│Expansion │ │Expansion │ │ Filtering│
└─────┬────┘ └─────┬─────┘ └────┬─────┘
│ │ │
│ │ │
└───────────────┼────────────────┘
┌──────────────────┐
│ Multi-Hop │
│ Retrieval │
│ │
│ Hop 1: Vector │
│ Hop 2: Graph │
│ Hop 3: Exact │
└─────────┬────────┘
┌──────────────────┐
│ Fusion & │
│ Reranking │
└─────────┬────────┘
┌──────────────────┐
│ LLM Generation │
│ + Citations │
└──────────────────┘

Key Components

1. Data Pipeline Layer

Transforms raw text into multiple representations:

  • Document chunks for vector search (300-500 tokens)
  • Entity extractions (characters, locations, events)
  • Structured triples for the graph (subject-relation-object)
  • Metadata tags (timeline markers, entity types)

2. Ontology / Knowledge Graph Layer

Maintains relationships:

  • Nodes: Characters, Houses, Locations, Events, Battles
  • Edges: ALLY_OF, PARTICIPATED_IN, LOCATED_AT, HAPPENED_AFTER
  • Capabilities: Multi-hop traversal, relationship inference

3. Embeddings & Vector Store Layer

Dense semantic search:

  • Sentence-level embeddings
  • pgvector for ANN search
  • Metadata filtering
  • Hybrid search (vector + SQL)

4. Retrieval Orchestrator (The Brain)

Coordinates all layers:

  • Query parsing and intent detection
  • Semantic and ontology expansion
  • Multi-hop retrieval workflow
  • Fusion and reranking
  • Prompt construction

5. LLM Reasoning & Generation Layer

Grounded answer generation:

  • Context injection
  • Citation tracking
  • Reasoning traces
  • Safety filtering

6. UI & API Gateway Layer

User interaction:

  • REST API
  • Interactive UI
  • Debug views
  • Visualization panels

When Do You Need Multi-Layer RAG?

Not every application needs this complexity. Here’s a decision framework:

✅ You NEED Multi-Layer RAG When:

  • Complex relationships matter: “Who influenced whom?”
  • Temporal reasoning is required: “What happened after X?”
  • Causal chains are important: “Why did X lead to Y?”
  • Multi-document synthesis is needed: Answer spans multiple sources
  • Domain has structured knowledge: Medical ontologies, legal precedents, enterprise taxonomies
  • Accuracy is critical: Cost of wrong answers is high

Examples: Medical diagnosis support, legal case analysis, historical research, enterprise knowledge bases, technical troubleshooting

❌ You DON’T Need Multi-Layer RAG When:

  • Simple factual lookups: “What is X?”
  • Single-document answers: Everything is in one place
  • No relationships matter: Documents are independent
  • Speed over accuracy: Quick responses more important than perfect answers
  • No graph structure: Your domain doesn’t have clear entities and relationships

Examples: FAQ chatbots, simple documentation search, product catalog search, blog content retrieval

Cost/Benefit Analysis

┌─────────────────────────────────────────────────────────┐
│ Complexity vs. Value │
├─────────────────────────────────────────────────────────┤
│ │
│ High │ ● Multi-Layer RAG │
│ V │ (High complexity, │
│ a │ high value) │
│ l │ │
│ u │ ● Hybrid RAG │
│ e │ (Medium complexity, │
│ │ medium value) │
│ │ │
│ │ ● Simple Vector RAG │
│ Low │ (Low complexity, low value) │
│ └──────────────────────────────────────────── │
│ Low Medium High │
│ Complexity │
└─────────────────────────────────────────────────────────┘

What’s Coming Next

In Part 2, we’ll dive deep into the retrieval engine and show you exactly how multi-hop retrieval works:

  • Complete query flow walkthrough with the Arjuna example
  • How semantic expansion works (with code)
  • How ontology expansion leverages graph traversal
  • The multi-hop retrieval algorithm step-by-step
  • Fusion and reranking strategies
  • Performance characteristics and benchmarks

You’ll see the actual queries, graph traversals, and fusion logic that makes multi-layer RAG work.


Key Takeaways

  1. Vector search alone is insufficient for complex queries requiring relationships, temporal reasoning, or causal chains

  2. Multi-layer RAG combines three approaches: semantic (vector), symbolic (graph), and multi-hop (iterative expansion)

  3. Ragforge implements this pattern as a production-ready, scalable system

  4. Not every application needs this complexity - use the decision framework to determine if multi-layer RAG is right for you

  5. The payoff is accuracy and explainability - you can trace exactly where answers come from and why they’re relevant


Lorem ipsum dolor sit amet, consectetur adipiscing elit.
00K00KMIT

Continue the Series

Part 1 (you are here) → Part 2: The Multi-Hop Retrieval Pipeline

Coming Up Next: We’ll trace a complete query from start to finish, showing you exactly how the retrieval orchestrator coordinates semantic search, graph traversal, and multi-hop expansion to answer complex questions.