Vector
Numeric representation
An ordered list of numbers that represents an item (word, image, user) as a point in high-dimensional space.
A vector is just a list: [0.21, -0.45, 0.78, ...]. In AI it usually
has 256–4096 dimensions and encodes the "meaning" of an item.
Similar items end up close together. The vectors for "cat" and "dog" sit near each other; "cat" and "car" sit far apart. Closeness is typically measured with cosine similarity or Euclidean distance.
Vectors support arithmetic — addition, subtraction, scaling. Famous
example: vec("king") - vec("man") + vec("woman") ≈ vec("queen").
Meaning genuinely lives inside the math.
Cities have lat/long coordinates: Istanbul (41, 29), Ankara (40, 33). Distance tells you how close they are. Vectors do the same for meaning — except instead of 2 dimensions, they use hundreds.
An e-commerce search. User types "winter jacket". Classic search returns products containing the words "winter" and "jacket" — but not "thermal parka." Vector search: turn "winter jacket" into a vector, fetch the 20 closest product vectors. Coats, parkas, anoraks — all come back because they're semantically close.
Same logic powers: user recommendations (similar user vectors), visual search (similar image vector), spam detection (close to known bad-message vectors).
- When semantic similarity matters (search, recommendations, clustering)
- Building RAG — chunks are stored as vectors
- When classic keyword matching falls short
- Multimodal search (find an image by text, etc.)
- Exact-match lookups (SKU, email, phone) — keyword + B-tree is faster
- Sparse data — vector overhead doesn't beat simple pattern matching
- Explainability is required — 'why this result?' is hard to answer
Dimension mismatch
Different models produce different-sized vectors. You can't mix OpenAI embeddings with Cohere ones in the same DB.
Forgetting to normalize
For cosine similarity vectors must be normalized (length = 1). Otherwise results lie.
Ignoring the curse of dimensionality
In very high-dimensional space everything looks 'somewhat far' from everything. 'Nearest' loses meaning at 4096 dims. 768–1536 is usually enough.