The Multi-Hop Retrieval Pipeline - From Query to Answer
/ 12 min read
Table of Contents
Part 2: The Multi-Hop Retrieval Pipeline - From Query to Answer
This is Part 2 of a 4-part series on building production-ready, multi-layer RAG systems with Ragforge.
- Part 1: Beyond Vector Search
- Part 2: The Multi-Hop Retrieval Pipeline (you are here)
- Part 3: Microservices Architecture
- Part 4: From Development to Production
Recap: The Promise of Multi-Layer RAG
In Part 1, we explored why traditional vector-only RAG systems struggle with complex queries that require:
- Understanding entity relationships
- Reasoning about time and sequences
- Traversing multiple information hops
- Combining evidence from multiple sources
Now let’s see exactly how multi-layer RAG solves these problems. We’ll trace a complete query through Ragforge’s retrieval pipeline, showing you every step, every decision, and every piece of data.
The Complete Query Flow
Let’s use our example from Part 1:
Query: “Who helped Arjuna after Day 10 of the war?”
This query will flow through six major steps:
Query → Parse → Expand → Retrieve → Fuse → Generate → Answer │ │ │ │ │ │ │ │ │ │ │ │ │ │ 1 2-3 3-4 4-5 5 6 7Let’s dive into each step.
Step 1: Query Understanding & Parsing
The first step is understanding what the user is actually asking.
Entity Extraction
# Example output from QueryParser{ "raw_query": "Who helped Arjuna after Day 10 of the war?", "entities": [ { "text": "Arjuna", "type": "CHARACTER", "confidence": 0.98 }, { "text": "Day 10", "type": "TEMPORAL_MARKER", "confidence": 0.95 }, { "text": "war", "type": "EVENT", "context": "Kurukshetra War", "confidence": 0.92 } ], "intent": "temporal_causal", "query_type": "who_question", "temporal_constraint": { "type": "after", "reference": "Day 10" }, "expected_answer_type": "CHARACTER"}Intent Classification
The query parser identifies this as a temporal-causal query because:
- It has a temporal constraint (“after Day 10”)
- It asks about causation/help (“who helped”)
- It requires relationship understanding
This classification determines how the retrieval pipeline prioritizes different layers.
Query Type Detection
# Query type determines retrieval strategyQUERY_TYPES = { "who_question": { "priority": ["ontology", "vector", "exact"], "graph_relations": ["ALLY_OF", "HELPED", "SUPPORTED"], "temporal_aware": True }, "what_question": { "priority": ["vector", "ontology", "exact"], "emphasis": "semantic_similarity" }, "why_question": { "priority": ["ontology", "vector"], "graph_relations": ["CAUSED", "LED_TO", "RESULTED_IN"], "causal_aware": True }}For our “who_question,” the system will prioritize ontology (graph) reasoning.
Step 2: Semantic Expansion
Now we expand the query semantically to improve recall.
Synonym Generation
{ "expansions": { "Arjuna": [ "Arjun", "Pandava", "third Pandava brother", "Partha", # another name for Arjuna "Dhananjaya" # another epithet ], "helped": [ "supported", "assisted", "aided", "protected", "allied with", "fought alongside", "advised" ], "war": [ "Kurukshetra war", "battle", "Mahabharata war", "conflict", "Kurukshetra battle" ] }}Semantic Subquery Generation
# Generate multiple semantic variantssemantic_queries = [ "Who helped Arjuna after Day 10 of the war?", # original "Who supported Arjuna following Day 10 of Kurukshetra?", "Which allies aided Arjuna in battles after Day 10?", "Who fought alongside Arjuna post Day 10?"]
# Each will be embedded and searchedfor query in semantic_queries: embedding = embed(query) results.extend(vector_search(embedding))This increases recall by capturing different phrasings of the same question.
Step 3: Ontology Expansion
Now we leverage the knowledge graph to understand relationships.
Graph Query Construction
// Find Arjuna and his connectionsMATCH (arjuna:Character {name: "Arjuna"})-[r:ALLY_OF|BROTHER_OF|PROTECTED_BY|ADVISED_BY]-(helper:Character)RETURN helper.name, type(r) as relationship
// Find events Arjuna participated in after Day 10MATCH (arjuna:Character {name: "Arjuna"})-[:PARTICIPATED_IN]->(event:Battle)WHERE event.day > 10RETURN event
// Find who else participated in those eventsMATCH (event:Battle)<-[:PARTICIPATED_IN]-(other:Character)WHERE event.day > 10 AND other.name <> "Arjuna"RETURN other.name, event.dayOntology Expansion Results
{ "primary_entity": "Arjuna", "direct_connections": [ { "entity": "Krishna", "relationship": "ALLY_OF", "roles": ["CHARIOTEER", "ADVISOR"], "relevance": 0.95 }, { "entity": "Bhima", "relationship": "BROTHER_OF", "roles": ["WARRIOR", "PROTECTOR"], "relevance": 0.88 }, { "entity": "Yudhishthira", "relationship": "BROTHER_OF", "roles": ["KING", "LEADER"], "relevance": 0.72 } ], "events_after_day_10": [ { "event": "Day 11 Battle", "participants": ["Arjuna", "Krishna", "Bhima", "Drona"], "day": 11 }, { "event": "Day 12 Battle", "participants": ["Arjuna", "Krishna", "Bhima"], "day": 12 } ], "expanded_search_entities": [ "Krishna", "Bhima", "Yudhishthira", "Day 11 Battle", "Day 12 Battle" ]}These expanded entities will guide the next retrieval hop.
Step 4: Multi-Hop Retrieval
Now comes the core of the system: multi-hop retrieval combining all layers.
Hop 1: Dense Vector Search
# Vector search with original queryquery_embedding = embed("Who helped Arjuna after Day 10 of the war?")
# SQL query to pgvectorresults_hop1 = vector_db.query(""" SELECT id, title, content, metadata, entities, 1 - (embedding <=> %s) as similarity FROM documents WHERE metadata->>'entity_type' = 'battle_narrative' ORDER BY embedding <=> %s LIMIT 20""", (query_embedding, query_embedding))Hop 1 Results (top 5):
[ { "id": "doc_089", "similarity": 0.87, "content": "The Kurukshetra war entered its eleventh day with renewed intensity. Arjuna, still grieving from the previous day's losses...", "entities": ["Arjuna", "Day 11", "Kurukshetra"], "metadata": {"day": 11, "character": "Arjuna"} }, { "id": "doc_134", "similarity": 0.84, "content": "Krishna continued to guide Arjuna through the darkest moments of the war. His counsel was invaluable...", "entities": ["Krishna", "Arjuna"], "metadata": {"character": ["Krishna", "Arjuna"]} }, # ... 18 more results]Hop 2: Graph-Guided Retrieval
Using the ontology expansion results, retrieve documents about related entities.
# Retrieve documents about Krishna (ally from graph)results_hop2_krishna = vector_db.query(""" SELECT * FROM documents WHERE entities @> '["Krishna"]'::jsonb AND (metadata->>'day')::int > 10 ORDER BY embedding <=> %s LIMIT 10""", (query_embedding,))
# Retrieve documents about Bhima (brother from graph)results_hop2_bhima = vector_db.query(""" SELECT * FROM documents WHERE entities @> '["Bhima"]'::jsonb AND (metadata->>'day')::int > 10 ORDER BY embedding <=> %s LIMIT 10""", (query_embedding,))
# Retrieve documents about Day 11-12 events (from graph)results_hop2_events = vector_db.query(""" SELECT * FROM documents WHERE (metadata->>'day')::int IN (11, 12) AND entities ?| array['Arjuna', 'Krishna', 'Bhima'] ORDER BY embedding <=> %s LIMIT 10""", (query_embedding,))Hop 2 Results (examples):
[ { "id": "doc_156", "source": "krishna_expansion", "content": "On the morning of Day 11, Krishna spoke to Arjuna: 'Today we must break through their center...'", "entities": ["Krishna", "Arjuna", "Day 11"], "metadata": {"day": 11, "type": "dialogue"} }, { "id": "doc_203", "source": "bhima_expansion", "content": "Bhima, seeing his brother's exhaustion, took position at Arjuna's left flank on Day 12...", "entities": ["Bhima", "Arjuna", "Day 12"], "metadata": {"day": 12, "action": "protection"} }]Notice how Hop 2 finds highly relevant documents that weren’t in Hop 1’s top results!
Hop 3: Exact Match & Metadata Filtering
For temporal queries, exact matching on metadata is crucial.
# Exact temporal filterresults_hop3 = vector_db.query(""" SELECT * FROM documents WHERE (metadata->>'day')::int > 10 AND ( content ILIKE '%helped Arjuna%' OR content ILIKE '%assisted Arjuna%' OR content ILIKE '%supported Arjuna%' ) AND entities ? 'Arjuna' ORDER BY (metadata->>'day')::int ASC LIMIT 10""")Hop 3 Results:
[ { "id": "doc_287", "match_type": "exact", "content": "Day 11 saw Krishna intensify his strategic guidance. His advice helped Arjuna overcome his doubt...", "metadata": {"day": 11, "exact_match": "helped Arjuna"} }, { "id": "doc_301", "match_type": "exact", "content": "Throughout Day 12 and 13, Bhima protected Arjuna's position, allowing him to focus on long-range attacks...", "metadata": {"day": 12, "exact_match": "protected Arjuna"} }]Combined Results
# Total results from all hopstotal_results = { "hop_1_vector": 20 results, "hop_2_graph_krishna": 10 results, "hop_2_graph_bhima": 10 results, "hop_2_graph_events": 10 results, "hop_3_exact": 10 results, "total_before_dedup": 60 results}Step 5: Fusion & Reranking
Now we need to merge, deduplicate, and rank these 60 results.
Deduplication
# Remove duplicate documentsunique_results = deduplicate_by_id(all_results)# 60 results → 42 unique documentsMulti-Signal Scoring
Each document gets scored based on multiple signals:
def compute_fusion_score(doc, query_info): scores = { # Semantic similarity (from vector search) "semantic": doc.similarity_score, # 0-1
# Graph centrality (from ontology) "graph": compute_graph_score(doc.entities, query_info.entities),
# Temporal relevance "temporal": compute_temporal_score( doc.metadata.get('day'), query_info.temporal_constraint ),
# Exact match bonus "exact": 1.0 if doc.match_type == "exact" else 0.0,
# Hop priority (earlier hops slightly preferred) "hop": 1.0 if doc.hop == 1 else 0.9 if doc.hop == 2 else 0.8 }
# Weighted combination weights = { "semantic": 0.25, "graph": 0.30, # Higher for "who" questions "temporal": 0.25, # Higher for "after" queries "exact": 0.15, "hop": 0.05 }
final_score = sum(scores[k] * weights[k] for k in scores) return final_scoreExample Scoring
# Document 1: High semantic, low graphdoc_089 = { "content": "Day 11 battle description...", "scores": { "semantic": 0.87, "graph": 0.45, # Doesn't mention key allies "temporal": 1.0, # Perfect temporal match "exact": 0.0, "hop": 1.0 }, "final_score": 0.71}
# Document 2: Medium semantic, high graphdoc_156 = { "content": "Krishna spoke to Arjuna on Day 11...", "scores": { "semantic": 0.76, "graph": 0.95, # Krishna is key ally from graph "temporal": 1.0, "exact": 0.0, "hop": 0.9 }, "final_score": 0.85 # WINS!}
# Document 3: Low semantic, high graph + exactdoc_203 = { "content": "Bhima protected Arjuna's position...", "scores": { "semantic": 0.68, "graph": 0.88, # Bhima is key ally from graph "temporal": 0.95, # Day 12 is after 10, slight delay "exact": 1.0, # Contains "protected Arjuna" "hop": 0.9 }, "final_score": 0.87 # HIGHEST!}Cross-Encoder Reranking (Optional)
For even better results, use a cross-encoder model:
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
# Rerank top 15 from fusionpairs = [(query, doc.content) for doc in top_15]rerank_scores = reranker.predict(pairs)
# Combine with fusion scoresfinal_scores = [ 0.7 * fusion_score + 0.3 * rerank_score for fusion_score, rerank_score in zip(fusion_scores, rerank_scores)]Final Ranked Context
top_10_context = [ { "rank": 1, "doc_id": "doc_203", "score": 0.87, "content": "Bhima protected Arjuna's position on Day 12...", "why_relevant": ["graph_ally", "temporal_match", "exact_match"] }, { "rank": 2, "doc_id": "doc_156", "score": 0.85, "content": "Krishna spoke to Arjuna on Day 11...", "why_relevant": ["graph_ally", "temporal_match", "high_semantic"] }, { "rank": 3, "doc_id": "doc_301", "score": 0.82, "content": "Throughout Days 12-13, Bhima provided cover...", "why_relevant": ["graph_ally", "temporal_match"] }, # ... 7 more]Step 6: Prompt Construction
Now we build an optimized prompt for the LLM.
Prompt Template
prompt_template = """You are an expert on the Mahabharata epic. Answer the question based ONLY on the provided context.
Question: {query}
Context:{context}
Instructions:1. Answer the question directly and concisely2. Use ONLY information from the provided context3. Cite sources using [doc_id] notation4. If the context doesn't fully answer the question, say so5. List all helpers mentioned, with their roles
Answer:"""Context Assembly
context = ""for i, doc in enumerate(top_10_context, 1): context += f"\n[{doc['doc_id']}] (Score: {doc['score']:.2f})\n" context += f"{doc['content']}\n" context += f"Relevance: {', '.join(doc['why_relevant'])}\n" context += "---\n"Final Prompt
You are an expert on the Mahabharata epic. Answer the question based ONLY on the provided context.
Question: Who helped Arjuna after Day 10 of the war?
Context:
[doc_203] (Score: 0.87)Bhima protected Arjuna's position on Day 12, allowing his brother to focus on long-range attacks against the Kaurava formation. His mace cleared the path forward.Relevance: graph_ally, temporal_match, exact_match---
[doc_156] (Score: 0.85)Krishna spoke to Arjuna on Day 11: "Today we must break through their center. I will guide you." His strategic counsel helped Arjuna overcome doubt and hesitation.Relevance: graph_ally, temporal_match, high_semantic---
[doc_301] (Score: 0.82)Throughout Days 12-13, Bhima provided cover while Krishna continued his role as charioteer and advisor, guiding Arjuna through difficult tactical decisions.Relevance: graph_ally, temporal_match---
[Additional context documents...]
Instructions:1. Answer the question directly and concisely2. Use ONLY information from the provided context3. Cite sources using [doc_id] notation4. If the context doesn't fully answer the question, say so5. List all helpers mentioned, with their roles
Answer:Step 7: LLM Generation & Citation
Send to LLM and receive grounded answer:
LLM Response
{ "answer": "After Day 10 of the Kurukshetra war, Arjuna received help from two key individuals:\n\n1. **Krishna** - Continued his role as Arjuna's charioteer and strategic advisor. On Day 11, Krishna provided crucial tactical counsel that helped Arjuna overcome doubt [doc_156]. Throughout Days 12-13, Krishna guided Arjuna through difficult decisions [doc_301].\n\n2. **Bhima** - Arjuna's brother protected his position on Day 12, using his mace to clear paths and allowing Arjuna to focus on long-range attacks [doc_203]. Bhima continued providing defensive cover throughout Days 12-13 [doc_301].\n\nBoth Krishna (as advisor/charioteer) and Bhima (as protector/warrior) played essential roles in supporting Arjuna during the battles following Day 10.",
"citations": [ { "doc_id": "doc_156", "claim": "Krishna provided tactical counsel on Day 11", "relevance": "direct_answer" }, { "doc_id": "doc_203", "claim": "Bhima protected Arjuna's position on Day 12", "relevance": "direct_answer" }, { "doc_id": "doc_301", "claim": "Both continued support through Days 12-13", "relevance": "supporting_evidence" } ],
"confidence": 0.92,
"retrieval_trace": { "hops": 3, "total_docs_considered": 42, "final_context_docs": 10, "semantic_contribution": 0.25, "graph_contribution": 0.50, "temporal_contribution": 0.25 }}Why This Works: Key Advantages
1. Relationship-Aware
The graph expansion found Krishna and Bhima as key allies before semantic search. Without the graph:
- Krishna might be rank 15 (mentioned in many contexts)
- Bhima might be rank 30 (less semantically similar to “help”)
2. Temporal-Sensitive
The temporal filter (“after Day 10”) ensured we only considered Days 11+. Without this:
- Documents about earlier days would score highly
- “Helped before Day 10” would confuse the answer
3. Multi-Hop Context
Different information came from different hops:
- Hop 1 found general battle narratives
- Hop 2 found specific ally actions (via graph)
- Hop 3 found exact phrases about helping
4. Explainable & Traceable
Every fact in the answer can be traced:
- “Krishna provided counsel” ← doc_156 (rank 2, score 0.85)
- “Bhima protected position” ← doc_203 (rank 1, score 0.87)
- Why ranked this way: graph_ally + temporal_match + exact_match
Performance Characteristics
Latency Breakdown
For the example query:
┌──────────────────────────────────────────────┐│ Latency Profile │├──────────────────────────────────────────────┤│ Query Parsing: 10ms ││ Semantic Expansion: 15ms ││ Ontology Expansion: 120ms (graph DB) ││ Hop 1 (Vector): 45ms ││ Hop 2 (3 queries): 130ms (parallel) ││ Hop 3 (Exact): 25ms ││ Deduplication: 5ms ││ Fusion Scoring: 30ms ││ Cross-Encoder Rerank: 180ms (optional) ││ Prompt Construction: 10ms ││ LLM Generation: 2500ms │├──────────────────────────────────────────────┤│ TOTAL (without rerank): ~2.9s ││ TOTAL (with rerank): ~3.1s │└──────────────────────────────────────────────┘Optimization Strategies
- Parallel Execution: Hops 1, 2, and 3 can run concurrently
- Caching: Cache ontology expansions for common entities
- Early Stopping: If Hop 1 has high-confidence exact matches, skip Hop 3
- Adaptive Depth: Simple queries use fewer hops
What’s Coming in Part 3
We’ve seen HOW the retrieval engine works. But how do you build this as a scalable, production-ready system?
In Part 3, we’ll explore:
- Microservices architecture: Why and how
- All 7 services breakdown with code examples
- Service-to-service communication patterns
- Why we chose pgvector, FastAPI, and Neo4j
- Design tradeoffs and alternatives considered
- Data schemas and API contracts
You’ll see the actual implementation architecture that makes this retrieval pipeline production-ready.
Key Takeaways
- Multi-hop retrieval is orchestrated, not sequential - layers inform each other
- Fusion scoring combines multiple signals - semantic, graph, temporal, exact
- Each hop serves a purpose: Hop 1 (recall), Hop 2 (relationships), Hop 3 (precision)
- The graph guides semantic search, not replaces it
- Explainability comes from tracking: every score, every hop, every decision
Continue the Series
Part 1: Beyond Vector Search ← Part 2 (you are here) → Part 3: Microservices Architecture
Coming Up Next: We’ll show you how this retrieval pipeline is implemented as a scalable microservices system with clear service boundaries, API contracts, and production-ready deployment.