Embedding
Meaning, encoded as numbers
A vector representation that captures the meaning of a piece of text, image, or audio.
An embedding model takes something like the word "king" and outputs a fixed-length vector. That vector encodes the meaning — embeddings for "king" and "queen" are close together.
Semantic search, recommendations, clustering, RAG — all powered by embeddings. Modern models: OpenAI text-embedding-3, Cohere Embed v3, Voyage, BGE, Jina.
Choice matters: language support, dimension (256–3072), context window (max input tokens), and cost ($0.02-$0.13 per 1M tokens) differ. Wrong model = bad retrieval = bad RAG.
You compress a 12-megapixel photo down to 1536 numbers, but those 1536 still carry enough info to answer "is this a cat or a dog?". Embeddings do exactly that: throw away the detail, keep the meaning. Lossy compression — but it loses the right thing.
A Slack message archive. You search "I mentioned the deploy failure last month." Classic search returns messages containing both "deploy" and "failure" — but not "Friday's release exploded" (different words, same meaning).
Embedding-based search: turn your query into a 1536-dim vector; every message already has a vector. Cosine similarity returns the 20 closest — "release exploded" included.
from openai import OpenAI
import numpy as np
client = OpenAI()
def embed(text: str) -> np.ndarray:
resp = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return np.array(resp.data[0].embedding)
def cosine(a: np.ndarray, b: np.ndarray) -> float:
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
v_king = embed("king")
v_queen = embed("queen")
v_potato = embed("potato")
print(cosine(v_king, v_queen)) # ≈ 0.78 (close in meaning)
print(cosine(v_king, v_potato)) # ≈ 0.23 (far apart)- Semantic search (beyond full-text)
- Indexing chunks for RAG
- Recommendation systems (user/item similarity)
- Duplicate detection (near-duplicates, not exact)
- Using embeddings as input features for classification
- Exact string matches (slug, ID, username)
- Structured queries (
WHERE created_at > X) - When decisions must be explainable
- Costly model + rarely-changing data — don't forget caching
Embed and forget
Upgrading the embedding model means re-embedding everything. Old and new vectors aren't in the same space. Plan migration up front.
Underestimating chunking
Embedding a whole document as one vector loses detail. You don't want to compress 50 pages into 1 vector. Proper chunking is critical.
Embedding non-English with English-only models
An English-trained model is weak on other languages. Prefer multilingual models like Cohere multilingual, BGE-m3.