AI Dictionary
Intermediate· ~2 min read#bm25#tf-idf#search

BM25

Best Matching 25 — classic relevance score

A classic formula that scores how relevant a document is to a query in keyword search. An evolved version of TF × IDF.

BM25 — THE TF × IDF CLASSICTFTerm Frequency"GDPR" appears:4 timesmore occurrences→ more relevant×IDFInverse Doc Freq"GDPR" in docs:12 / 10Krare = distinctive→ more valuable=RELEVANCE SCORE8.4→ rank near topElasticsearch, Lucene, OpenSearch — all built on BM25 (1994)
Definition

BM25 (Best Matching 25) is a relevance-scoring formula developed by Stephen Robertson et al. in 1994, and still the core of traditional search engines. Even in the AI era, it's a key piece of hybrid search and RAG systems.

Built from two components: - TF (Term Frequency): how many times does the query word appear in the document? More = more relevant. - IDF (Inverse Document Frequency): how rare is the word across all documents? Rarer = more distinctive.

BM25 adds two refinements over naive TF×IDF: term saturation (don't reward repeating the same word infinitely) and length normalization (short doc + exact match → high score).

Elasticsearch, OpenSearch, Lucene, Solr — all use BM25 as the default. Postgres tsvector uses a similar TF-IDF approach.

Analogy

The mathematical way to know how relevant a book is to your query. "TF": how many pages mention "cancer"? Many → that's the topic. "IDF": is "cancer" rare in the whole library or in every book? If everywhere, it doesn't differentiate. Together: a relevance score.

Real-world example

Query: "GDPR Article 17 right to erasure" Corpus: 10,000 legal docs.

BM25 (simplified): - Doc A: "GDPR" appears (TF=8, high IDF), "Article 17" appears (TF=3, very high IDF), "erasure" missing → score: 12.4 - Doc B: "GDPR" present (TF=2), "erasure" present (TF=15), no "Article 17" → score: 6.8 - Doc C: all terms present but a 500-page general report (long) → score: 9.1 (length adjustment)

Order: A > C > B. Doc A wins thanks to exact-term match + rare-IDF bonus + reasonable length.

When to use
  • Hybrid search — always pair BM25 with vector search
  • Exact-term search (code, SKUs, legal article numbers)
  • Low-resource environments — no embedding model required
  • Explainable search — you can show why a result ranked
  • Multilingual — independent of vector-model language
When not to use
  • Need semantic similarity only (synonyms, paraphrase)
  • Word order matters (BM25 treats words as a bag)
  • Very short docs (TF becomes meaningless)
  • Users with typos (BM25 has no tolerance)
Common pitfalls

Insufficient on its own (in modern context)

BM25 won't return a 'compliance' doc to a 'uyum' query. Hybrid (BM25 + vector) is standard practice. Relying on BM25 alone is being 30 years behind.

Forgetting stop words

English 'the', 'a', Turkish 'bir', 'ile' have high TF, low IDF — BM25 handles them naturally but bad config produces noise. Language-specific stopword lists are needed.

Skipping stemming

'Run', 'running', 'ran' counted as different tokens means lower TF. Stemmers (Snowball, Zemberek for TR) are essential. Without them you miss relevant docs.