BM25
Best Matching 25 — classic relevance score
A classic formula that scores how relevant a document is to a query in keyword search. An evolved version of TF × IDF.
BM25 (Best Matching 25) is a relevance-scoring formula developed by Stephen Robertson et al. in 1994, and still the core of traditional search engines. Even in the AI era, it's a key piece of hybrid search and RAG systems.
Built from two components: - TF (Term Frequency): how many times does the query word appear in the document? More = more relevant. - IDF (Inverse Document Frequency): how rare is the word across all documents? Rarer = more distinctive.
BM25 adds two refinements over naive TF×IDF: term saturation (don't reward repeating the same word infinitely) and length normalization (short doc + exact match → high score).
Elasticsearch, OpenSearch, Lucene, Solr — all use BM25 as the default.
Postgres tsvector uses a similar TF-IDF approach.
The mathematical way to know how relevant a book is to your query. "TF": how many pages mention "cancer"? Many → that's the topic. "IDF": is "cancer" rare in the whole library or in every book? If everywhere, it doesn't differentiate. Together: a relevance score.
Query: "GDPR Article 17 right to erasure" Corpus: 10,000 legal docs.
BM25 (simplified): - Doc A: "GDPR" appears (TF=8, high IDF), "Article 17" appears (TF=3, very high IDF), "erasure" missing → score: 12.4 - Doc B: "GDPR" present (TF=2), "erasure" present (TF=15), no "Article 17" → score: 6.8 - Doc C: all terms present but a 500-page general report (long) → score: 9.1 (length adjustment)
Order: A > C > B. Doc A wins thanks to exact-term match + rare-IDF bonus + reasonable length.
- Hybrid search — always pair BM25 with vector search
- Exact-term search (code, SKUs, legal article numbers)
- Low-resource environments — no embedding model required
- Explainable search — you can show why a result ranked
- Multilingual — independent of vector-model language
- Need semantic similarity only (synonyms, paraphrase)
- Word order matters (BM25 treats words as a bag)
- Very short docs (TF becomes meaningless)
- Users with typos (BM25 has no tolerance)
Insufficient on its own (in modern context)
BM25 won't return a 'compliance' doc to a 'uyum' query. Hybrid (BM25 + vector) is standard practice. Relying on BM25 alone is being 30 years behind.
Forgetting stop words
English 'the', 'a', Turkish 'bir', 'ile' have high TF, low IDF — BM25 handles them naturally but bad config produces noise. Language-specific stopword lists are needed.
Skipping stemming
'Run', 'running', 'ran' counted as different tokens means lower TF. Stemmers (Snowball, Zemberek for TR) are essential. Without them you miss relevant docs.