AI Dictionary
Beginner· ~1 min read#vector#math

Vector

Numeric representation

An ordered list of numbers that represents an item (word, image, user) as a point in high-dimensional space.

SEMANTIC SPACEdim 1dim 2animalscat, dog, wolf…fruitsapple, pear, plum…similar meanings sit close — far apart means unrelated
Definition

A vector is just a list: [0.21, -0.45, 0.78, ...]. In AI it usually has 256–4096 dimensions and encodes the "meaning" of an item.

Similar items end up close together. The vectors for "cat" and "dog" sit near each other; "cat" and "car" sit far apart. Closeness is typically measured with cosine similarity or Euclidean distance.

Vectors support arithmetic — addition, subtraction, scaling. Famous example: vec("king") - vec("man") + vec("woman") ≈ vec("queen"). Meaning genuinely lives inside the math.

Analogy

Cities have lat/long coordinates: Istanbul (41, 29), Ankara (40, 33). Distance tells you how close they are. Vectors do the same for meaning — except instead of 2 dimensions, they use hundreds.

Real-world example

An e-commerce search. User types "winter jacket". Classic search returns products containing the words "winter" and "jacket" — but not "thermal parka." Vector search: turn "winter jacket" into a vector, fetch the 20 closest product vectors. Coats, parkas, anoraks — all come back because they're semantically close.

Same logic powers: user recommendations (similar user vectors), visual search (similar image vector), spam detection (close to known bad-message vectors).

When to use
  • When semantic similarity matters (search, recommendations, clustering)
  • Building RAG — chunks are stored as vectors
  • When classic keyword matching falls short
  • Multimodal search (find an image by text, etc.)
When not to use
  • Exact-match lookups (SKU, email, phone) — keyword + B-tree is faster
  • Sparse data — vector overhead doesn't beat simple pattern matching
  • Explainability is required — 'why this result?' is hard to answer
Common pitfalls

Dimension mismatch

Different models produce different-sized vectors. You can't mix OpenAI embeddings with Cohere ones in the same DB.

Forgetting to normalize

For cosine similarity vectors must be normalized (length = 1). Otherwise results lie.

Ignoring the curse of dimensionality

In very high-dimensional space everything looks 'somewhat far' from everything. 'Nearest' loses meaning at 4096 dims. 768–1536 is usually enough.