RAG-Fusion
Multi-query + fused ranking
An advanced RAG technique that generates multiple query variants, runs a separate search for each, and merges results into a single ranked list.
Classic RAG's weakness: one query, one search. A user's "GDPR compliance" and "data protection rules" mean similar things, yet vector similarity returns the same 5 chunks. Many relevant results get missed.
RAG-Fusion fixes this: 1. Have an LLM generate 3-5 query variants from the original (paraphrase, sub-question, synonym). 2. Search each variant separately (vector or hybrid). 3. Merge all results with Reciprocal Rank Fusion (RRF).
RRF scoring: a doc at position n in some search gets score
1/(k+n) (k usually 60). Sum scores across all searches, sort by
total. No score-normalization headache, simple, effective.
Result: docs that one query would have missed (different words, different angles) get caught. Recall up 30-50%, precision usually preserved.
Searching a newspaper for news. Instead of one query "presidential election," try variants: "presidential race," "vote counts," "campaign trail." Count how often each article appears — the most frequent ones are most relevant. That's RAG-Fusion — the LLM generates variants for you.
A SaaS internal-docs RAG. User: "I want to upgrade to Pro, how does billing work?" Classic RAG: focused on "Pro plan", "upgrade", "billing" → 5 chunks, mostly pricing tier, billing procedure may be missed.
With RAG-Fusion the LLM produces 4 variants: 1. "Plan upgrade billing process" 2. "Pro tier subscription invoice" 3. "Account upgrade payment" 4. "Plan switch billing cycle"
Each returns 5 results, RRF merges → top 8 chunks: pricing info + payment procedure + billing cycle + refund policy together. LLM gives a comprehensive answer. Recall boost is large, hallucination drops noticeably.
Token cost is ~3-4× higher but the quality jump is significant.
- Complex or ambiguous user queries (not single-meaning)
- Multilingual content (TR-EN mix, variants catch both)
- Large corpus (>10K docs) — coverage matters
- When recall is critical (legal/medical research)
- Combine with reranker — fusion + reranker hits highest precision
- Latency-critical — variant generation + N searches is expensive
- Tight token budget (multiple LLM calls)
- Clear, narrow query — fusion overhead not worth it
- Single-language, single-domain — basic RAG suffices
Variant generation can hallucinate too
An LLM asked to 'generate query variants' can produce off-topic ones ('Pro plan' → 'membership signup steps'). Use few-shot examples in the variant prompt and validate quality.
Wrong RRF k parameter
k=60 is standard but in very large corpora, lower k (10-20) gives sharper ordering. Tune to your domain, A/B test.
Fusion + reranker doubled cost
Using both fusion and reranker is powerful but 2× latency, 2× cost. Measure in production what's actually needed — sometimes one is enough.