skip to content

Part 2: The Multi-Hop Retrieval Pipeline - From Query to Answer

This is Part 2 of a 4-part series on building production-ready, multi-layer RAG systems with Ragforge.


Recap: The Promise of Multi-Layer RAG

In Part 1, we explored why traditional vector-only RAG systems struggle with complex queries that require:

  • Understanding entity relationships
  • Reasoning about time and sequences
  • Traversing multiple information hops
  • Combining evidence from multiple sources

Now let’s see exactly how multi-layer RAG solves these problems. We’ll trace a complete query through Ragforge’s retrieval pipeline, showing you every step, every decision, and every piece of data.


The Complete Query Flow

Let’s use our example from Part 1:

Query: “Who helped Arjuna after Day 10 of the war?”

This query will flow through six major steps:

Query → Parse → Expand → Retrieve → Fuse → Generate → Answer
│ │ │ │ │ │ │
│ │ │ │ │ │ │
1 2-3 3-4 4-5 5 6 7

Let’s dive into each step.


Step 1: Query Understanding & Parsing

The first step is understanding what the user is actually asking.

Entity Extraction

# Example output from QueryParser
{
"raw_query": "Who helped Arjuna after Day 10 of the war?",
"entities": [
{
"text": "Arjuna",
"type": "CHARACTER",
"confidence": 0.98
},
{
"text": "Day 10",
"type": "TEMPORAL_MARKER",
"confidence": 0.95
},
{
"text": "war",
"type": "EVENT",
"context": "Kurukshetra War",
"confidence": 0.92
}
],
"intent": "temporal_causal",
"query_type": "who_question",
"temporal_constraint": {
"type": "after",
"reference": "Day 10"
},
"expected_answer_type": "CHARACTER"
}

Intent Classification

The query parser identifies this as a temporal-causal query because:

  • It has a temporal constraint (“after Day 10”)
  • It asks about causation/help (“who helped”)
  • It requires relationship understanding

This classification determines how the retrieval pipeline prioritizes different layers.

Query Type Detection

# Query type determines retrieval strategy
QUERY_TYPES = {
"who_question": {
"priority": ["ontology", "vector", "exact"],
"graph_relations": ["ALLY_OF", "HELPED", "SUPPORTED"],
"temporal_aware": True
},
"what_question": {
"priority": ["vector", "ontology", "exact"],
"emphasis": "semantic_similarity"
},
"why_question": {
"priority": ["ontology", "vector"],
"graph_relations": ["CAUSED", "LED_TO", "RESULTED_IN"],
"causal_aware": True
}
}

For our “who_question,” the system will prioritize ontology (graph) reasoning.


Step 2: Semantic Expansion

Now we expand the query semantically to improve recall.

Synonym Generation

{
"expansions": {
"Arjuna": [
"Arjun",
"Pandava",
"third Pandava brother",
"Partha", # another name for Arjuna
"Dhananjaya" # another epithet
],
"helped": [
"supported",
"assisted",
"aided",
"protected",
"allied with",
"fought alongside",
"advised"
],
"war": [
"Kurukshetra war",
"battle",
"Mahabharata war",
"conflict",
"Kurukshetra battle"
]
}
}

Semantic Subquery Generation

# Generate multiple semantic variants
semantic_queries = [
"Who helped Arjuna after Day 10 of the war?", # original
"Who supported Arjuna following Day 10 of Kurukshetra?",
"Which allies aided Arjuna in battles after Day 10?",
"Who fought alongside Arjuna post Day 10?"
]
# Each will be embedded and searched
for query in semantic_queries:
embedding = embed(query)
results.extend(vector_search(embedding))

This increases recall by capturing different phrasings of the same question.


Step 3: Ontology Expansion

Now we leverage the knowledge graph to understand relationships.

Graph Query Construction

// Find Arjuna and his connections
MATCH (arjuna:Character {name: "Arjuna"})
-[r:ALLY_OF|BROTHER_OF|PROTECTED_BY|ADVISED_BY]-(helper:Character)
RETURN helper.name, type(r) as relationship
// Find events Arjuna participated in after Day 10
MATCH (arjuna:Character {name: "Arjuna"})
-[:PARTICIPATED_IN]->(event:Battle)
WHERE event.day > 10
RETURN event
// Find who else participated in those events
MATCH (event:Battle)<-[:PARTICIPATED_IN]-(other:Character)
WHERE event.day > 10 AND other.name <> "Arjuna"
RETURN other.name, event.day

Ontology Expansion Results

{
"primary_entity": "Arjuna",
"direct_connections": [
{
"entity": "Krishna",
"relationship": "ALLY_OF",
"roles": ["CHARIOTEER", "ADVISOR"],
"relevance": 0.95
},
{
"entity": "Bhima",
"relationship": "BROTHER_OF",
"roles": ["WARRIOR", "PROTECTOR"],
"relevance": 0.88
},
{
"entity": "Yudhishthira",
"relationship": "BROTHER_OF",
"roles": ["KING", "LEADER"],
"relevance": 0.72
}
],
"events_after_day_10": [
{
"event": "Day 11 Battle",
"participants": ["Arjuna", "Krishna", "Bhima", "Drona"],
"day": 11
},
{
"event": "Day 12 Battle",
"participants": ["Arjuna", "Krishna", "Bhima"],
"day": 12
}
],
"expanded_search_entities": [
"Krishna",
"Bhima",
"Yudhishthira",
"Day 11 Battle",
"Day 12 Battle"
]
}

These expanded entities will guide the next retrieval hop.


Step 4: Multi-Hop Retrieval

Now comes the core of the system: multi-hop retrieval combining all layers.

# Vector search with original query
query_embedding = embed("Who helped Arjuna after Day 10 of the war?")
# SQL query to pgvector
results_hop1 = vector_db.query("""
SELECT
id,
title,
content,
metadata,
entities,
1 - (embedding <=> %s) as similarity
FROM documents
WHERE
metadata->>'entity_type' = 'battle_narrative'
ORDER BY embedding <=> %s
LIMIT 20
""", (query_embedding, query_embedding))

Hop 1 Results (top 5):

[
{
"id": "doc_089",
"similarity": 0.87,
"content": "The Kurukshetra war entered its eleventh day with renewed intensity. Arjuna, still grieving from the previous day's losses...",
"entities": ["Arjuna", "Day 11", "Kurukshetra"],
"metadata": {"day": 11, "character": "Arjuna"}
},
{
"id": "doc_134",
"similarity": 0.84,
"content": "Krishna continued to guide Arjuna through the darkest moments of the war. His counsel was invaluable...",
"entities": ["Krishna", "Arjuna"],
"metadata": {"character": ["Krishna", "Arjuna"]}
},
# ... 18 more results
]

Hop 2: Graph-Guided Retrieval

Using the ontology expansion results, retrieve documents about related entities.

# Retrieve documents about Krishna (ally from graph)
results_hop2_krishna = vector_db.query("""
SELECT *
FROM documents
WHERE
entities @> '["Krishna"]'::jsonb
AND (metadata->>'day')::int > 10
ORDER BY
embedding <=> %s
LIMIT 10
""", (query_embedding,))
# Retrieve documents about Bhima (brother from graph)
results_hop2_bhima = vector_db.query("""
SELECT *
FROM documents
WHERE
entities @> '["Bhima"]'::jsonb
AND (metadata->>'day')::int > 10
ORDER BY
embedding <=> %s
LIMIT 10
""", (query_embedding,))
# Retrieve documents about Day 11-12 events (from graph)
results_hop2_events = vector_db.query("""
SELECT *
FROM documents
WHERE
(metadata->>'day')::int IN (11, 12)
AND entities ?| array['Arjuna', 'Krishna', 'Bhima']
ORDER BY
embedding <=> %s
LIMIT 10
""", (query_embedding,))

Hop 2 Results (examples):

[
{
"id": "doc_156",
"source": "krishna_expansion",
"content": "On the morning of Day 11, Krishna spoke to Arjuna: 'Today we must break through their center...'",
"entities": ["Krishna", "Arjuna", "Day 11"],
"metadata": {"day": 11, "type": "dialogue"}
},
{
"id": "doc_203",
"source": "bhima_expansion",
"content": "Bhima, seeing his brother's exhaustion, took position at Arjuna's left flank on Day 12...",
"entities": ["Bhima", "Arjuna", "Day 12"],
"metadata": {"day": 12, "action": "protection"}
}
]

Notice how Hop 2 finds highly relevant documents that weren’t in Hop 1’s top results!

Hop 3: Exact Match & Metadata Filtering

For temporal queries, exact matching on metadata is crucial.

# Exact temporal filter
results_hop3 = vector_db.query("""
SELECT *
FROM documents
WHERE
(metadata->>'day')::int > 10
AND (
content ILIKE '%helped Arjuna%'
OR content ILIKE '%assisted Arjuna%'
OR content ILIKE '%supported Arjuna%'
)
AND entities ? 'Arjuna'
ORDER BY (metadata->>'day')::int ASC
LIMIT 10
""")

Hop 3 Results:

[
{
"id": "doc_287",
"match_type": "exact",
"content": "Day 11 saw Krishna intensify his strategic guidance. His advice helped Arjuna overcome his doubt...",
"metadata": {"day": 11, "exact_match": "helped Arjuna"}
},
{
"id": "doc_301",
"match_type": "exact",
"content": "Throughout Day 12 and 13, Bhima protected Arjuna's position, allowing him to focus on long-range attacks...",
"metadata": {"day": 12, "exact_match": "protected Arjuna"}
}
]

Combined Results

# Total results from all hops
total_results = {
"hop_1_vector": 20 results,
"hop_2_graph_krishna": 10 results,
"hop_2_graph_bhima": 10 results,
"hop_2_graph_events": 10 results,
"hop_3_exact": 10 results,
"total_before_dedup": 60 results
}

Step 5: Fusion & Reranking

Now we need to merge, deduplicate, and rank these 60 results.

Deduplication

# Remove duplicate documents
unique_results = deduplicate_by_id(all_results)
# 60 results → 42 unique documents

Multi-Signal Scoring

Each document gets scored based on multiple signals:

def compute_fusion_score(doc, query_info):
scores = {
# Semantic similarity (from vector search)
"semantic": doc.similarity_score, # 0-1
# Graph centrality (from ontology)
"graph": compute_graph_score(doc.entities, query_info.entities),
# Temporal relevance
"temporal": compute_temporal_score(
doc.metadata.get('day'),
query_info.temporal_constraint
),
# Exact match bonus
"exact": 1.0 if doc.match_type == "exact" else 0.0,
# Hop priority (earlier hops slightly preferred)
"hop": 1.0 if doc.hop == 1 else 0.9 if doc.hop == 2 else 0.8
}
# Weighted combination
weights = {
"semantic": 0.25,
"graph": 0.30, # Higher for "who" questions
"temporal": 0.25, # Higher for "after" queries
"exact": 0.15,
"hop": 0.05
}
final_score = sum(scores[k] * weights[k] for k in scores)
return final_score

Example Scoring

# Document 1: High semantic, low graph
doc_089 = {
"content": "Day 11 battle description...",
"scores": {
"semantic": 0.87,
"graph": 0.45, # Doesn't mention key allies
"temporal": 1.0, # Perfect temporal match
"exact": 0.0,
"hop": 1.0
},
"final_score": 0.71
}
# Document 2: Medium semantic, high graph
doc_156 = {
"content": "Krishna spoke to Arjuna on Day 11...",
"scores": {
"semantic": 0.76,
"graph": 0.95, # Krishna is key ally from graph
"temporal": 1.0,
"exact": 0.0,
"hop": 0.9
},
"final_score": 0.85 # WINS!
}
# Document 3: Low semantic, high graph + exact
doc_203 = {
"content": "Bhima protected Arjuna's position...",
"scores": {
"semantic": 0.68,
"graph": 0.88, # Bhima is key ally from graph
"temporal": 0.95, # Day 12 is after 10, slight delay
"exact": 1.0, # Contains "protected Arjuna"
"hop": 0.9
},
"final_score": 0.87 # HIGHEST!
}

Cross-Encoder Reranking (Optional)

For even better results, use a cross-encoder model:

from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
# Rerank top 15 from fusion
pairs = [(query, doc.content) for doc in top_15]
rerank_scores = reranker.predict(pairs)
# Combine with fusion scores
final_scores = [
0.7 * fusion_score + 0.3 * rerank_score
for fusion_score, rerank_score in zip(fusion_scores, rerank_scores)
]

Final Ranked Context

top_10_context = [
{
"rank": 1,
"doc_id": "doc_203",
"score": 0.87,
"content": "Bhima protected Arjuna's position on Day 12...",
"why_relevant": ["graph_ally", "temporal_match", "exact_match"]
},
{
"rank": 2,
"doc_id": "doc_156",
"score": 0.85,
"content": "Krishna spoke to Arjuna on Day 11...",
"why_relevant": ["graph_ally", "temporal_match", "high_semantic"]
},
{
"rank": 3,
"doc_id": "doc_301",
"score": 0.82,
"content": "Throughout Days 12-13, Bhima provided cover...",
"why_relevant": ["graph_ally", "temporal_match"]
},
# ... 7 more
]

Step 6: Prompt Construction

Now we build an optimized prompt for the LLM.

Prompt Template

prompt_template = """
You are an expert on the Mahabharata epic. Answer the question based ONLY on the provided context.
Question: {query}
Context:
{context}
Instructions:
1. Answer the question directly and concisely
2. Use ONLY information from the provided context
3. Cite sources using [doc_id] notation
4. If the context doesn't fully answer the question, say so
5. List all helpers mentioned, with their roles
Answer:
"""

Context Assembly

context = ""
for i, doc in enumerate(top_10_context, 1):
context += f"\n[{doc['doc_id']}] (Score: {doc['score']:.2f})\n"
context += f"{doc['content']}\n"
context += f"Relevance: {', '.join(doc['why_relevant'])}\n"
context += "---\n"

Final Prompt

You are an expert on the Mahabharata epic. Answer the question based ONLY on the provided context.
Question: Who helped Arjuna after Day 10 of the war?
Context:
[doc_203] (Score: 0.87)
Bhima protected Arjuna's position on Day 12, allowing his brother to focus on long-range attacks against the Kaurava formation. His mace cleared the path forward.
Relevance: graph_ally, temporal_match, exact_match
---
[doc_156] (Score: 0.85)
Krishna spoke to Arjuna on Day 11: "Today we must break through their center. I will guide you." His strategic counsel helped Arjuna overcome doubt and hesitation.
Relevance: graph_ally, temporal_match, high_semantic
---
[doc_301] (Score: 0.82)
Throughout Days 12-13, Bhima provided cover while Krishna continued his role as charioteer and advisor, guiding Arjuna through difficult tactical decisions.
Relevance: graph_ally, temporal_match
---
[Additional context documents...]
Instructions:
1. Answer the question directly and concisely
2. Use ONLY information from the provided context
3. Cite sources using [doc_id] notation
4. If the context doesn't fully answer the question, say so
5. List all helpers mentioned, with their roles
Answer:

Step 7: LLM Generation & Citation

Send to LLM and receive grounded answer:

LLM Response

{
"answer": "After Day 10 of the Kurukshetra war, Arjuna received help from two key individuals:\n\n1. **Krishna** - Continued his role as Arjuna's charioteer and strategic advisor. On Day 11, Krishna provided crucial tactical counsel that helped Arjuna overcome doubt [doc_156]. Throughout Days 12-13, Krishna guided Arjuna through difficult decisions [doc_301].\n\n2. **Bhima** - Arjuna's brother protected his position on Day 12, using his mace to clear paths and allowing Arjuna to focus on long-range attacks [doc_203]. Bhima continued providing defensive cover throughout Days 12-13 [doc_301].\n\nBoth Krishna (as advisor/charioteer) and Bhima (as protector/warrior) played essential roles in supporting Arjuna during the battles following Day 10.",
"citations": [
{
"doc_id": "doc_156",
"claim": "Krishna provided tactical counsel on Day 11",
"relevance": "direct_answer"
},
{
"doc_id": "doc_203",
"claim": "Bhima protected Arjuna's position on Day 12",
"relevance": "direct_answer"
},
{
"doc_id": "doc_301",
"claim": "Both continued support through Days 12-13",
"relevance": "supporting_evidence"
}
],
"confidence": 0.92,
"retrieval_trace": {
"hops": 3,
"total_docs_considered": 42,
"final_context_docs": 10,
"semantic_contribution": 0.25,
"graph_contribution": 0.50,
"temporal_contribution": 0.25
}
}

Why This Works: Key Advantages

1. Relationship-Aware

The graph expansion found Krishna and Bhima as key allies before semantic search. Without the graph:

  • Krishna might be rank 15 (mentioned in many contexts)
  • Bhima might be rank 30 (less semantically similar to “help”)

2. Temporal-Sensitive

The temporal filter (“after Day 10”) ensured we only considered Days 11+. Without this:

  • Documents about earlier days would score highly
  • “Helped before Day 10” would confuse the answer

3. Multi-Hop Context

Different information came from different hops:

  • Hop 1 found general battle narratives
  • Hop 2 found specific ally actions (via graph)
  • Hop 3 found exact phrases about helping

4. Explainable & Traceable

Every fact in the answer can be traced:

  • “Krishna provided counsel” ← doc_156 (rank 2, score 0.85)
  • “Bhima protected position” ← doc_203 (rank 1, score 0.87)
  • Why ranked this way: graph_ally + temporal_match + exact_match

Performance Characteristics

Latency Breakdown

For the example query:

┌──────────────────────────────────────────────┐
│ Latency Profile │
├──────────────────────────────────────────────┤
│ Query Parsing: 10ms │
│ Semantic Expansion: 15ms │
│ Ontology Expansion: 120ms (graph DB) │
│ Hop 1 (Vector): 45ms │
│ Hop 2 (3 queries): 130ms (parallel) │
│ Hop 3 (Exact): 25ms │
│ Deduplication: 5ms │
│ Fusion Scoring: 30ms │
│ Cross-Encoder Rerank: 180ms (optional) │
│ Prompt Construction: 10ms │
│ LLM Generation: 2500ms │
├──────────────────────────────────────────────┤
│ TOTAL (without rerank): ~2.9s │
│ TOTAL (with rerank): ~3.1s │
└──────────────────────────────────────────────┘

Optimization Strategies

  1. Parallel Execution: Hops 1, 2, and 3 can run concurrently
  2. Caching: Cache ontology expansions for common entities
  3. Early Stopping: If Hop 1 has high-confidence exact matches, skip Hop 3
  4. Adaptive Depth: Simple queries use fewer hops

What’s Coming in Part 3

We’ve seen HOW the retrieval engine works. But how do you build this as a scalable, production-ready system?

In Part 3, we’ll explore:

  • Microservices architecture: Why and how
  • All 7 services breakdown with code examples
  • Service-to-service communication patterns
  • Why we chose pgvector, FastAPI, and Neo4j
  • Design tradeoffs and alternatives considered
  • Data schemas and API contracts

You’ll see the actual implementation architecture that makes this retrieval pipeline production-ready.


Key Takeaways

  1. Multi-hop retrieval is orchestrated, not sequential - layers inform each other
  2. Fusion scoring combines multiple signals - semantic, graph, temporal, exact
  3. Each hop serves a purpose: Hop 1 (recall), Hop 2 (relationships), Hop 3 (precision)
  4. The graph guides semantic search, not replaces it
  5. Explainability comes from tracking: every score, every hop, every decision

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
00K00KMIT

Continue the Series

Part 1: Beyond Vector Search ← Part 2 (you are here) → Part 3: Microservices Architecture

Coming Up Next: We’ll show you how this retrieval pipeline is implemented as a scalable microservices system with clear service boundaries, API contracts, and production-ready deployment.