AI Atlas
Intermediate · ~1 min read #embedding #vector

Embedding

Meaning represented as numbers

A vector representation that captures the meaning of text, images, or audio.

WORD → MEANING-VECTOR"king"EMBEDDINGMODELVECTOR · 1536 DIM[0.21, -0.45, 0.78,0.12, -0.33, 0.66, …]meaning becomes math: "king − man + woman ≈ queen"
Definition

An embedding model takes an input like the word "king" and represents it as a fixed-length vector. This vector encodes its meaning — for example, the embeddings of "king" and "queen" are close to each other.

Semantic search, recommendation systems, clustering, and RAG — all are powered by embeddings. Modern models include OpenAI text-embedding-3, Cohere Embed v3, Voyage, BGE, and Jina.

Model choice is critical: language support, vector dimension (256–3072), context window (maximum input tokens), and cost ($0.02–$0.13 per 1M tokens) vary across models. A poor choice leads to weak retrieval and poor RAG performance.

Analogy

You compress a 12-megapixel image into 1536 numbers, yet those numbers still contain enough information to answer "is this a cat or a dog?". That’s what embeddings do: reduce detail while preserving meaning. It’s lossy compression — but it discards the right details.

Real-world example

Imagine a Slack message archive. You search for "I mentioned the deploy failure last month." A traditional search returns messages containing both "deploy" and "failure" — but not something like "Friday's release exploded" (different words, same meaning).

With embedding-based search, your query is converted into a vector, and each message already has its own vector. Using cosine similarity, the system returns the top 20 closest messages — including "release exploded".

A deeper look
MEANING ARITHMETICkingmanwomanqueen ✓king − man + woman ≈ queenthe direction between meanings is mathematically preserved
Code examples
OpenAI embedding + cosine similarity python
from openai import OpenAI
import numpy as np

client = OpenAI()

def embed(text: str) -> np.ndarray:
    resp = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return np.array(resp.data[0].embedding)

def cosine(a: np.ndarray, b: np.ndarray) -> float:
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

v_king   = embed("king")
v_queen  = embed("queen")
v_potato = embed("potato")

print(cosine(v_king, v_queen))   # ≈ 0.78  (semantically close)
print(cosine(v_king, v_potato))  # ≈ 0.23  (semantically distant)
When to use
  • Semantic search (beyond keyword matching)
  • Indexing chunks for RAG
  • Recommendation systems (user/item similarity)
  • Duplicate detection (near-duplicates rather than exact matches)
  • Using embeddings as features for classification
When not to use
  • Exact string matching (slug, ID, username)
  • Structured queries (WHERE created_at > X)
  • When decisions must be strictly explainable
  • When model cost is high and data rarely changes — make sure to cache
Common pitfalls

Embed and forget

Updating the embedding model requires re-embedding all data. Old and new vectors do not share the same space, so you must plan a proper migration.

Underestimating chunking

Embedding an entire document as a single vector causes loss of detail. You don’t want to compress 50 pages into one vector. Proper chunking is essential.

Using English-only models for other languages

Models trained only on English perform poorly on other languages. Prefer multilingual models such as Cohere multilingual or BGE-m3.