Part 2: The Multi-Hop Retrieval Pipeline - From Query to Answer

This is Part 2 of a 4-part series on building production-ready, multi-layer RAG systems with Ragforge.

Part 1: Beyond Vector Search

Part 2: The Multi-Hop Retrieval Pipeline (you are here)

Part 3: Microservices Architecture

Part 4: From Development to Production

Recap: The Promise of Multi-Layer RAG

In Part 1, we explored why traditional vector-only RAG systems struggle with complex queries that require:

Understanding entity relationships
Reasoning about time and sequences
Traversing multiple information hops
Combining evidence from multiple sources

Now let’s see exactly how multi-layer RAG solves these problems. We’ll trace a complete query through Ragforge’s retrieval pipeline, showing you every step, every decision, and every piece of data.

The Complete Query Flow

Let’s use our example from Part 1:

Query: “Who helped Arjuna after Day 10 of the war?”

This query will flow through six major steps:

Query → Parse → Expand → Retrieve → Fuse → Generate → Answer
   │       │       │         │        │       │         │
   │       │       │         │        │       │         │
   1      2-3     3-4       4-5       5       6         7

Let’s dive into each step.

Step 1: Query Understanding & Parsing

The first step is understanding what the user is actually asking.

Entity Extraction

# Example output from QueryParser
{
  "raw_query": "Who helped Arjuna after Day 10 of the war?",
  "entities": [
    {
      "text": "Arjuna",
      "type": "CHARACTER",
      "confidence": 0.98
    },
    {
      "text": "Day 10",
      "type": "TEMPORAL_MARKER",
      "confidence": 0.95
    },
    {
      "text": "war",
      "type": "EVENT",
      "context": "Kurukshetra War",
      "confidence": 0.92
    }
  ],
  "intent": "temporal_causal",
  "query_type": "who_question",
  "temporal_constraint": {
    "type": "after",
    "reference": "Day 10"
  },
  "expected_answer_type": "CHARACTER"
}

Intent Classification

The query parser identifies this as a temporal-causal query because:

It has a temporal constraint (“after Day 10”)
It asks about causation/help (“who helped”)
It requires relationship understanding

This classification determines how the retrieval pipeline prioritizes different layers.

Query Type Detection

# Query type determines retrieval strategy
QUERY_TYPES = {
    "who_question": {
        "priority": ["ontology", "vector", "exact"],
        "graph_relations": ["ALLY_OF", "HELPED", "SUPPORTED"],
        "temporal_aware": True
    },
    "what_question": {
        "priority": ["vector", "ontology", "exact"],
        "emphasis": "semantic_similarity"
    },
    "why_question": {
        "priority": ["ontology", "vector"],
        "graph_relations": ["CAUSED", "LED_TO", "RESULTED_IN"],
        "causal_aware": True
    }
}

For our “who_question,” the system will prioritize ontology (graph) reasoning.

Step 2: Semantic Expansion

Now we expand the query semantically to improve recall.

Synonym Generation

{
  "expansions": {
    "Arjuna": [
      "Arjun",
      "Pandava",
      "third Pandava brother",
      "Partha",  # another name for Arjuna
      "Dhananjaya"  # another epithet
    ],
    "helped": [
      "supported",
      "assisted",
      "aided",
      "protected",
      "allied with",
      "fought alongside",
      "advised"
    ],
    "war": [
      "Kurukshetra war",
      "battle",
      "Mahabharata war",
      "conflict",
      "Kurukshetra battle"
    ]
  }
}

Semantic Subquery Generation

# Generate multiple semantic variants
semantic_queries = [
    "Who helped Arjuna after Day 10 of the war?",  # original
    "Who supported Arjuna following Day 10 of Kurukshetra?",
    "Which allies aided Arjuna in battles after Day 10?",
    "Who fought alongside Arjuna post Day 10?"
]

# Each will be embedded and searched
for query in semantic_queries:
    embedding = embed(query)
    results.extend(vector_search(embedding))

This increases recall by capturing different phrasings of the same question.

Step 3: Ontology Expansion

Now we leverage the knowledge graph to understand relationships.

Graph Query Construction

// Find Arjuna and his connections
MATCH (arjuna:Character {name: "Arjuna"})
-[r:ALLY_OF|BROTHER_OF|PROTECTED_BY|ADVISED_BY]-(helper:Character)
RETURN helper.name, type(r) as relationship

// Find events Arjuna participated in after Day 10
MATCH (arjuna:Character {name: "Arjuna"})
-[:PARTICIPATED_IN]->(event:Battle)
WHERE event.day > 10
RETURN event

// Find who else participated in those events
MATCH (event:Battle)<-[:PARTICIPATED_IN]-(other:Character)
WHERE event.day > 10 AND other.name <> "Arjuna"
RETURN other.name, event.day

Ontology Expansion Results

{
  "primary_entity": "Arjuna",
  "direct_connections": [
    {
      "entity": "Krishna",
      "relationship": "ALLY_OF",
      "roles": ["CHARIOTEER", "ADVISOR"],
      "relevance": 0.95
    },
    {
      "entity": "Bhima",
      "relationship": "BROTHER_OF",
      "roles": ["WARRIOR", "PROTECTOR"],
      "relevance": 0.88
    },
    {
      "entity": "Yudhishthira",
      "relationship": "BROTHER_OF",
      "roles": ["KING", "LEADER"],
      "relevance": 0.72
    }
  ],
  "events_after_day_10": [
    {
      "event": "Day 11 Battle",
      "participants": ["Arjuna", "Krishna", "Bhima", "Drona"],
      "day": 11
    },
    {
      "event": "Day 12 Battle",
      "participants": ["Arjuna", "Krishna", "Bhima"],
      "day": 12
    }
  ],
  "expanded_search_entities": [
    "Krishna",
    "Bhima",
    "Yudhishthira",
    "Day 11 Battle",
    "Day 12 Battle"
  ]
}

These expanded entities will guide the next retrieval hop.

Step 4: Multi-Hop Retrieval

Now comes the core of the system: multi-hop retrieval combining all layers.

Hop 1: Dense Vector Search

# Vector search with original query
query_embedding = embed("Who helped Arjuna after Day 10 of the war?")

# SQL query to pgvector
results_hop1 = vector_db.query("""
    SELECT
        id,
        title,
        content,
        metadata,
        entities,
        1 - (embedding <=> %s) as similarity
    FROM documents
    WHERE
        metadata->>'entity_type' = 'battle_narrative'
    ORDER BY embedding <=> %s
    LIMIT 20
""", (query_embedding, query_embedding))

Hop 1 Results (top 5):

[
    {
        "id": "doc_089",
        "similarity": 0.87,
        "content": "The Kurukshetra war entered its eleventh day with renewed intensity. Arjuna, still grieving from the previous day's losses...",
        "entities": ["Arjuna", "Day 11", "Kurukshetra"],
        "metadata": {"day": 11, "character": "Arjuna"}
    },
    {
        "id": "doc_134",
        "similarity": 0.84,
        "content": "Krishna continued to guide Arjuna through the darkest moments of the war. His counsel was invaluable...",
        "entities": ["Krishna", "Arjuna"],
        "metadata": {"character": ["Krishna", "Arjuna"]}
    },
    # ... 18 more results
]

Hop 2: Graph-Guided Retrieval

Using the ontology expansion results, retrieve documents about related entities.

# Retrieve documents about Krishna (ally from graph)
results_hop2_krishna = vector_db.query("""
    SELECT *
    FROM documents
    WHERE
        entities @> '["Krishna"]'::jsonb
        AND (metadata->>'day')::int > 10
    ORDER BY
        embedding <=> %s
    LIMIT 10
""", (query_embedding,))

# Retrieve documents about Bhima (brother from graph)
results_hop2_bhima = vector_db.query("""
    SELECT *
    FROM documents
    WHERE
        entities @> '["Bhima"]'::jsonb
        AND (metadata->>'day')::int > 10
    ORDER BY
        embedding <=> %s
    LIMIT 10
""", (query_embedding,))

# Retrieve documents about Day 11-12 events (from graph)
results_hop2_events = vector_db.query("""
    SELECT *
    FROM documents
    WHERE
        (metadata->>'day')::int IN (11, 12)
        AND entities ?| array['Arjuna', 'Krishna', 'Bhima']
    ORDER BY
        embedding <=> %s
    LIMIT 10
""", (query_embedding,))

Hop 2 Results (examples):

[
    {
        "id": "doc_156",
        "source": "krishna_expansion",
        "content": "On the morning of Day 11, Krishna spoke to Arjuna: 'Today we must break through their center...'",
        "entities": ["Krishna", "Arjuna", "Day 11"],
        "metadata": {"day": 11, "type": "dialogue"}
    },
    {
        "id": "doc_203",
        "source": "bhima_expansion",
        "content": "Bhima, seeing his brother's exhaustion, took position at Arjuna's left flank on Day 12...",
        "entities": ["Bhima", "Arjuna", "Day 12"],
        "metadata": {"day": 12, "action": "protection"}
    }
]

Notice how Hop 2 finds highly relevant documents that weren’t in Hop 1’s top results!

Hop 3: Exact Match & Metadata Filtering

For temporal queries, exact matching on metadata is crucial.

# Exact temporal filter
results_hop3 = vector_db.query("""
    SELECT *
    FROM documents
    WHERE
        (metadata->>'day')::int > 10
        AND (
            content ILIKE '%helped Arjuna%'
            OR content ILIKE '%assisted Arjuna%'
            OR content ILIKE '%supported Arjuna%'
        )
        AND entities ? 'Arjuna'
    ORDER BY (metadata->>'day')::int ASC
    LIMIT 10
""")

Hop 3 Results:

[
    {
        "id": "doc_287",
        "match_type": "exact",
        "content": "Day 11 saw Krishna intensify his strategic guidance. His advice helped Arjuna overcome his doubt...",
        "metadata": {"day": 11, "exact_match": "helped Arjuna"}
    },
    {
        "id": "doc_301",
        "match_type": "exact",
        "content": "Throughout Day 12 and 13, Bhima protected Arjuna's position, allowing him to focus on long-range attacks...",
        "metadata": {"day": 12, "exact_match": "protected Arjuna"}
    }
]

Combined Results

# Total results from all hops
total_results = {
    "hop_1_vector": 20 results,
    "hop_2_graph_krishna": 10 results,
    "hop_2_graph_bhima": 10 results,
    "hop_2_graph_events": 10 results,
    "hop_3_exact": 10 results,
    "total_before_dedup": 60 results
}

Step 5: Fusion & Reranking

Now we need to merge, deduplicate, and rank these 60 results.

Deduplication

# Remove duplicate documents
unique_results = deduplicate_by_id(all_results)
# 60 results → 42 unique documents

Multi-Signal Scoring

Each document gets scored based on multiple signals:

def compute_fusion_score(doc, query_info):
    scores = {
        # Semantic similarity (from vector search)
        "semantic": doc.similarity_score,  # 0-1

        # Graph centrality (from ontology)
        "graph": compute_graph_score(doc.entities, query_info.entities),

        # Temporal relevance
        "temporal": compute_temporal_score(
            doc.metadata.get('day'),
            query_info.temporal_constraint
        ),

        # Exact match bonus
        "exact": 1.0 if doc.match_type == "exact" else 0.0,

        # Hop priority (earlier hops slightly preferred)
        "hop": 1.0 if doc.hop == 1 else 0.9 if doc.hop == 2 else 0.8
    }

    # Weighted combination
    weights = {
        "semantic": 0.25,
        "graph": 0.30,      # Higher for "who" questions
        "temporal": 0.25,   # Higher for "after" queries
        "exact": 0.15,
        "hop": 0.05
    }

    final_score = sum(scores[k] * weights[k] for k in scores)
    return final_score

Example Scoring

# Document 1: High semantic, low graph
doc_089 = {
    "content": "Day 11 battle description...",
    "scores": {
        "semantic": 0.87,
        "graph": 0.45,  # Doesn't mention key allies
        "temporal": 1.0,  # Perfect temporal match
        "exact": 0.0,
        "hop": 1.0
    },
    "final_score": 0.71
}

# Document 2: Medium semantic, high graph
doc_156 = {
    "content": "Krishna spoke to Arjuna on Day 11...",
    "scores": {
        "semantic": 0.76,
        "graph": 0.95,  # Krishna is key ally from graph
        "temporal": 1.0,
        "exact": 0.0,
        "hop": 0.9
    },
    "final_score": 0.85  # WINS!
}

# Document 3: Low semantic, high graph + exact
doc_203 = {
    "content": "Bhima protected Arjuna's position...",
    "scores": {
        "semantic": 0.68,
        "graph": 0.88,  # Bhima is key ally from graph
        "temporal": 0.95,  # Day 12 is after 10, slight delay
        "exact": 1.0,  # Contains "protected Arjuna"
        "hop": 0.9
    },
    "final_score": 0.87  # HIGHEST!
}

Cross-Encoder Reranking (Optional)

For even better results, use a cross-encoder model:

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

# Rerank top 15 from fusion
pairs = [(query, doc.content) for doc in top_15]
rerank_scores = reranker.predict(pairs)

# Combine with fusion scores
final_scores = [
    0.7 * fusion_score + 0.3 * rerank_score
    for fusion_score, rerank_score in zip(fusion_scores, rerank_scores)
]

Final Ranked Context

top_10_context = [
    {
        "rank": 1,
        "doc_id": "doc_203",
        "score": 0.87,
        "content": "Bhima protected Arjuna's position on Day 12...",
        "why_relevant": ["graph_ally", "temporal_match", "exact_match"]
    },
    {
        "rank": 2,
        "doc_id": "doc_156",
        "score": 0.85,
        "content": "Krishna spoke to Arjuna on Day 11...",
        "why_relevant": ["graph_ally", "temporal_match", "high_semantic"]
    },
    {
        "rank": 3,
        "doc_id": "doc_301",
        "score": 0.82,
        "content": "Throughout Days 12-13, Bhima provided cover...",
        "why_relevant": ["graph_ally", "temporal_match"]
    },
    # ... 7 more
]

Step 6: Prompt Construction

Now we build an optimized prompt for the LLM.

Prompt Template

prompt_template = """
You are an expert on the Mahabharata epic. Answer the question based ONLY on the provided context.

Question: {query}

Context:
{context}

Instructions:
1. Answer the question directly and concisely
2. Use ONLY information from the provided context
3. Cite sources using [doc_id] notation
4. If the context doesn't fully answer the question, say so
5. List all helpers mentioned, with their roles

Answer:
"""

Context Assembly

context = ""
for i, doc in enumerate(top_10_context, 1):
    context += f"\n[{doc['doc_id']}] (Score: {doc['score']:.2f})\n"
    context += f"{doc['content']}\n"
    context += f"Relevance: {', '.join(doc['why_relevant'])}\n"
    context += "---\n"

Final Prompt

You are an expert on the Mahabharata epic. Answer the question based ONLY on the provided context.

Question: Who helped Arjuna after Day 10 of the war?

Context:

[doc_203] (Score: 0.87)
Bhima protected Arjuna's position on Day 12, allowing his brother to focus on long-range attacks against the Kaurava formation. His mace cleared the path forward.
Relevance: graph_ally, temporal_match, exact_match
---

[doc_156] (Score: 0.85)
Krishna spoke to Arjuna on Day 11: "Today we must break through their center. I will guide you." His strategic counsel helped Arjuna overcome doubt and hesitation.
Relevance: graph_ally, temporal_match, high_semantic
---

[doc_301] (Score: 0.82)
Throughout Days 12-13, Bhima provided cover while Krishna continued his role as charioteer and advisor, guiding Arjuna through difficult tactical decisions.
Relevance: graph_ally, temporal_match
---

[Additional context documents...]

Instructions:
1. Answer the question directly and concisely
2. Use ONLY information from the provided context
3. Cite sources using [doc_id] notation
4. If the context doesn't fully answer the question, say so
5. List all helpers mentioned, with their roles

Answer:

Step 7: LLM Generation & Citation

Send to LLM and receive grounded answer:

LLM Response

{
  "answer": "After Day 10 of the Kurukshetra war, Arjuna received help from two key individuals:\n\n1. **Krishna** - Continued his role as Arjuna's charioteer and strategic advisor. On Day 11, Krishna provided crucial tactical counsel that helped Arjuna overcome doubt [doc_156]. Throughout Days 12-13, Krishna guided Arjuna through difficult decisions [doc_301].\n\n2. **Bhima** - Arjuna's brother protected his position on Day 12, using his mace to clear paths and allowing Arjuna to focus on long-range attacks [doc_203]. Bhima continued providing defensive cover throughout Days 12-13 [doc_301].\n\nBoth Krishna (as advisor/charioteer) and Bhima (as protector/warrior) played essential roles in supporting Arjuna during the battles following Day 10.",

  "citations": [
    {
      "doc_id": "doc_156",
      "claim": "Krishna provided tactical counsel on Day 11",
      "relevance": "direct_answer"
    },
    {
      "doc_id": "doc_203",
      "claim": "Bhima protected Arjuna's position on Day 12",
      "relevance": "direct_answer"
    },
    {
      "doc_id": "doc_301",
      "claim": "Both continued support through Days 12-13",
      "relevance": "supporting_evidence"
    }
  ],

  "confidence": 0.92,

  "retrieval_trace": {
    "hops": 3,
    "total_docs_considered": 42,
    "final_context_docs": 10,
    "semantic_contribution": 0.25,
    "graph_contribution": 0.50,
    "temporal_contribution": 0.25
  }
}

Why This Works: Key Advantages

1. Relationship-Aware

The graph expansion found Krishna and Bhima as key allies before semantic search. Without the graph:

Krishna might be rank 15 (mentioned in many contexts)
Bhima might be rank 30 (less semantically similar to “help”)

2. Temporal-Sensitive

The temporal filter (“after Day 10”) ensured we only considered Days 11+. Without this:

Documents about earlier days would score highly
“Helped before Day 10” would confuse the answer

3. Multi-Hop Context

Different information came from different hops:

Hop 1 found general battle narratives
Hop 2 found specific ally actions (via graph)
Hop 3 found exact phrases about helping

4. Explainable & Traceable

Every fact in the answer can be traced:

“Krishna provided counsel” ← doc_156 (rank 2, score 0.85)
“Bhima protected position” ← doc_203 (rank 1, score 0.87)
Why ranked this way: graph_ally + temporal_match + exact_match

Performance Characteristics

Latency Breakdown

For the example query:

┌──────────────────────────────────────────────┐
│              Latency Profile                 │
├──────────────────────────────────────────────┤
│ Query Parsing:              10ms             │
│ Semantic Expansion:         15ms             │
│ Ontology Expansion:        120ms (graph DB)  │
│ Hop 1 (Vector):             45ms             │
│ Hop 2 (3 queries):         130ms (parallel)  │
│ Hop 3 (Exact):              25ms             │
│ Deduplication:               5ms             │
│ Fusion Scoring:             30ms             │
│ Cross-Encoder Rerank:      180ms (optional)  │
│ Prompt Construction:        10ms             │
│ LLM Generation:           2500ms             │
├──────────────────────────────────────────────┤
│ TOTAL (without rerank):  ~2.9s               │
│ TOTAL (with rerank):     ~3.1s               │
└──────────────────────────────────────────────┘

Optimization Strategies

Parallel Execution: Hops 1, 2, and 3 can run concurrently
Caching: Cache ontology expansions for common entities
Early Stopping: If Hop 1 has high-confidence exact matches, skip Hop 3
Adaptive Depth: Simple queries use fewer hops

What’s Coming in Part 3

We’ve seen HOW the retrieval engine works. But how do you build this as a scalable, production-ready system?

In Part 3, we’ll explore:

Microservices architecture: Why and how
All 7 services breakdown with code examples
Service-to-service communication patterns
Why we chose pgvector, FastAPI, and Neo4j
Design tradeoffs and alternatives considered
Data schemas and API contracts

You’ll see the actual implementation architecture that makes this retrieval pipeline production-ready.

Key Takeaways

Multi-hop retrieval is orchestrated, not sequential - layers inform each other
Fusion scoring combines multiple signals - semantic, graph, temporal, exact
Each hop serves a purpose: Hop 1 (recall), Hop 2 (relationships), Hop 3 (precision)
The graph guides semantic search, not replaces it
Explainability comes from tracking: every score, every hop, every decision

iamthatdev/ragforge

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

00K00KMIT

Continue the Series

Part 1: Beyond Vector Search ← Part 2 (you are here) → Part 3: Microservices Architecture

Coming Up Next: We’ll show you how this retrieval pipeline is implemented as a scalable microservices system with clear service boundaries, API contracts, and production-ready deployment.