RAG-Fusion — Explained · AI Sözlüğü

Definition

Classic RAG's weakness: one query, one search. A user's "GDPR compliance" and "data protection rules" mean similar things, yet vector similarity returns the same 5 chunks. Many relevant results get missed.

RAG-Fusion fixes this: 1. Have an LLM generate 3-5 query variants from the original (paraphrase, sub-question, synonym). 2. Search each variant separately (vector or hybrid). 3. Merge all results with Reciprocal Rank Fusion (RRF).

RRF scoring: a doc at position n in some search gets score 1/(k+n) (k usually 60). Sum scores across all searches, sort by total. No score-normalization headache, simple, effective.

Result: docs that one query would have missed (different words, different angles) get caught. Recall up 30-50%, precision usually preserved.

Analogy

Searching a newspaper for news. Instead of one query "presidential election," try variants: "presidential race," "vote counts," "campaign trail." Count how often each article appears — the most frequent ones are most relevant. That's RAG-Fusion — the LLM generates variants for you.

Real-world example

A SaaS internal-docs RAG. User: "I want to upgrade to Pro, how does billing work?" Classic RAG: focused on "Pro plan", "upgrade", "billing" → 5 chunks, mostly pricing tier, billing procedure may be missed.

With RAG-Fusion the LLM produces 4 variants: 1. "Plan upgrade billing process" 2. "Pro tier subscription invoice" 3. "Account upgrade payment" 4. "Plan switch billing cycle"

Each returns 5 results, RRF merges → top 8 chunks: pricing info + payment procedure + billing cycle + refund policy together. LLM gives a comprehensive answer. Recall boost is large, hallucination drops noticeably.

Token cost is ~3-4× higher but the quality jump is significant.

When to use

Complex or ambiguous user queries (not single-meaning)
Multilingual content (TR-EN mix, variants catch both)
Large corpus (>10K docs) — coverage matters
When recall is critical (legal/medical research)
Combine with reranker — fusion + reranker hits highest precision

When not to use

Latency-critical — variant generation + N searches is expensive
Tight token budget (multiple LLM calls)
Clear, narrow query — fusion overhead not worth it
Single-language, single-domain — basic RAG suffices

Common pitfalls

Variant generation can hallucinate too

An LLM asked to 'generate query variants' can produce off-topic ones ('Pro plan' → 'membership signup steps'). Use few-shot examples in the variant prompt and validate quality.

Wrong RRF k parameter

k=60 is standard but in very large corpora, lower k (10-20) gives sharper ordering. Tune to your domain, A/B test.

Fusion + reranker doubled cost

Using both fusion and reranker is powerful but 2× latency, 2× cost. Measure in production what's actually needed — sometimes one is enough.