AI Dictionary
Advanced· ~2 min read#rag-fusion#rrf#multi-query

RAG-Fusion

Multi-query + fused ranking

An advanced RAG technique that generates multiple query variants, runs a separate search for each, and merges results into a single ranked list.

MULTIPLE QUERIES + FUSED RANKING"GDPR compliance?"LLM"GDPR Article 17""data protection rules""compliance workflow"RRF · MERGE ALL RESULTS→ broader coverage, more robust rankingfrom one user question to five different searches by the model
Definition

Classic RAG's weakness: one query, one search. A user's "GDPR compliance" and "data protection rules" mean similar things, yet vector similarity returns the same 5 chunks. Many relevant results get missed.

RAG-Fusion fixes this: 1. Have an LLM generate 3-5 query variants from the original (paraphrase, sub-question, synonym). 2. Search each variant separately (vector or hybrid). 3. Merge all results with Reciprocal Rank Fusion (RRF).

RRF scoring: a doc at position n in some search gets score 1/(k+n) (k usually 60). Sum scores across all searches, sort by total. No score-normalization headache, simple, effective.

Result: docs that one query would have missed (different words, different angles) get caught. Recall up 30-50%, precision usually preserved.

Analogy

Searching a newspaper for news. Instead of one query "presidential election," try variants: "presidential race," "vote counts," "campaign trail." Count how often each article appears — the most frequent ones are most relevant. That's RAG-Fusion — the LLM generates variants for you.

Real-world example

A SaaS internal-docs RAG. User: "I want to upgrade to Pro, how does billing work?" Classic RAG: focused on "Pro plan", "upgrade", "billing" → 5 chunks, mostly pricing tier, billing procedure may be missed.

With RAG-Fusion the LLM produces 4 variants: 1. "Plan upgrade billing process" 2. "Pro tier subscription invoice" 3. "Account upgrade payment" 4. "Plan switch billing cycle"

Each returns 5 results, RRF merges → top 8 chunks: pricing info + payment procedure + billing cycle + refund policy together. LLM gives a comprehensive answer. Recall boost is large, hallucination drops noticeably.

Token cost is ~3-4× higher but the quality jump is significant.

When to use
  • Complex or ambiguous user queries (not single-meaning)
  • Multilingual content (TR-EN mix, variants catch both)
  • Large corpus (>10K docs) — coverage matters
  • When recall is critical (legal/medical research)
  • Combine with reranker — fusion + reranker hits highest precision
When not to use
  • Latency-critical — variant generation + N searches is expensive
  • Tight token budget (multiple LLM calls)
  • Clear, narrow query — fusion overhead not worth it
  • Single-language, single-domain — basic RAG suffices
Common pitfalls

Variant generation can hallucinate too

An LLM asked to 'generate query variants' can produce off-topic ones ('Pro plan' → 'membership signup steps'). Use few-shot examples in the variant prompt and validate quality.

Wrong RRF k parameter

k=60 is standard but in very large corpora, lower k (10-20) gives sharper ordering. Tune to your domain, A/B test.

Fusion + reranker doubled cost

Using both fusion and reranker is powerful but 2× latency, 2× cost. Measure in production what's actually needed — sometimes one is enough.