Intermediate· ~1 min read#embedding#vector

Embedding

Meaning, encoded as numbers

A vector representation that captures the meaning of a piece of text, image, or audio.

Definition

An embedding model takes something like the word "king" and outputs a fixed-length vector. That vector encodes the meaning — embeddings for "king" and "queen" are close together.

Semantic search, recommendations, clustering, RAG — all powered by embeddings. Modern models: OpenAI text-embedding-3, Cohere Embed v3, Voyage, BGE, Jina.

Choice matters: language support, dimension (256–3072), context window (max input tokens), and cost ($0.02-$0.13 per 1M tokens) differ. Wrong model = bad retrieval = bad RAG.

Analogy

You compress a 12-megapixel photo down to 1536 numbers, but those 1536 still carry enough info to answer "is this a cat or a dog?". Embeddings do exactly that: throw away the detail, keep the meaning. Lossy compression — but it loses the right thing.

Real-world example

A Slack message archive. You search "I mentioned the deploy failure last month." Classic search returns messages containing both "deploy" and "failure" — but not "Friday's release exploded" (different words, same meaning).

Embedding-based search: turn your query into a 1536-dim vector; every message already has a vector. Cosine similarity returns the 20 closest — "release exploded" included.

A deeper look

Code examples

OpenAI embedding + cosine similarityPython

from openai import OpenAI
import numpy as np

client = OpenAI()

def embed(text: str) -> np.ndarray:
    resp = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return np.array(resp.data[0].embedding)

def cosine(a: np.ndarray, b: np.ndarray) -> float:
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

v_king   = embed("king")
v_queen  = embed("queen")
v_potato = embed("potato")

print(cosine(v_king, v_queen))   # ≈ 0.78  (close in meaning)
print(cosine(v_king, v_potato))  # ≈ 0.23  (far apart)

When to use

Semantic search (beyond full-text)
Indexing chunks for RAG
Recommendation systems (user/item similarity)
Duplicate detection (near-duplicates, not exact)
Using embeddings as input features for classification

When not to use

Exact string matches (slug, ID, username)
Structured queries (WHERE created_at > X)
When decisions must be explainable
Costly model + rarely-changing data — don't forget caching

Common pitfalls

Embed and forget

Upgrading the embedding model means re-embedding everything. Old and new vectors aren't in the same space. Plan migration up front.

Underestimating chunking

Embedding a whole document as one vector loses detail. You don't want to compress 50 pages into 1 vector. Proper chunking is critical.

Embedding non-English with English-only models

An English-trained model is weak on other languages. Prefer multilingual models like Cohere multilingual, BGE-m3.