Reranker — Explained · AI Sözlüğü

Definition

Vector search is fast but coarse: a cosine similarity over millions of vectors returns top-50 in sub-second time. Problem: the top-5 of those 50 isn't always correctly ordered for the user — vector embeddings encode query and doc separately, missing the deep interaction.

A reranker takes those 50 results and feeds each as a (query, doc) pair to a cross-encoder model. The cross-encoder looks at the pair together and produces a much more accurate relevance score. Result: top-50 → reranked top-5, with real precision.

Common tools: Cohere Rerank, bge-reranker, Voyage rerank, mxbai-rerank. Can boost RAG retrieval quality by 20-30%.

Analogy

In a library, first check the catalog (vector search — fast, rough) and pull 50 titles. Then have an editor go through them and say "these 5 are actually closest to your topic" (reranker — slow, precise). The catalog alone isn't enough; the editor layer lifts quality.

Real-world example

A SaaS docs RAG. User asks: "how do I bypass the API rate limit?" Vector search returns 50 chunks; top-5: 1. "Rate limit basics" (0.91) 2. "Pricing tiers" (0.88) 3. "Authentication" (0.85) 4. "Throttling guide" (0.82) ← actual answer here 5. "Plan comparison" (0.80)

The reranker scores all 50 against the query as pairs; "Throttling guide" wins: 1. Throttling guide (0.96) 2. Rate limit basics (0.92) 3. Plan comparison (0.78) ...

LLM gets the right chunk first; answer quality jumps noticeably.

When to use

RAG retrieval quality is poor — reranker is the first intervention
Top-K is large (50-100) but LLM context fits few (5-10)
High-stakes domains (legal, medical) — wrong chunk = wrong answer
Multilingual search — vectors alone miss linguistic nuance

When not to use

Top-3 is already 95% correct — extra layer not worth it
Latency-critical — rerankers add 100-500ms
Tiny corpus (<1K chunks) — vector search alone is fine

Common pitfalls

Cross-encoders are slow

Bi-encoders (vector) encode once; cross-encoders run again for every query-doc pair. 50 results = 50 forward passes. Batching and a smaller model are key.

Wrong model choice

Multilingual rerankers are less optimized for English, monolingual ones weak on non-English. Match the model to your data. Cohere Rerank v3 is multilingual + production-ready.

Reranker isn't a silver bullet

First good chunking, then good retrieval, then reranker. Don't use a reranker to save bad retrieval; fix the earlier layers too.