AI Atlas
Beginner· ~2 min read#k-nn#k-nearest-neighbors#lazy-learning

k-Nearest Neighbors

k-NN · Predict by your neighbors

For a new point, look at the k closest training examples and use their majority label (classification) or mean label (regression) — simple, surprisingly strong.

K-NEAREST NEIGHBORS?k = 3Predict by majority label of the k nearest training points.
Definition

k-NN arguably has no training step; it stores the training data. At prediction time it finds the k closest points (usually by Euclidean distance) and returns the majority label for classification or the mean label for regression. That's why k-NN is called a lazy learner — work is deferred to inference.

Three knobs matter most: k (neighbor count), distance metric (Euclidean, Manhattan, cosine), and feature scaling. Tiny k → noisy decisions (overfit), large k → smooth, sometimes over-smooth (underfit). Pick odd k for binary classification to avoid ties.

k-NN makes no assumptions about data shape, captures nonlinear boundaries, and works for both classification and regression. Its cost is at inference time: every new point has to be compared to the entire training set. KD-Tree, Ball-Tree, and Approximate Nearest Neighbor (ANN) libraries make it practical at scale.

Analogy

A new restaurant opens on your block — is it good? You check the five most-similar restaurants nearby; if four are well-rated, this one probably is too. You didn't derive a "restaurant goodness law"; you reasoned from the closest examples. k-NN does exactly that — no rules, just neighborhoods.

Real-world example

An e-commerce site wants "people who liked this also liked" recommendations. Classic collaborative filtering is essentially a k-NN variant: build a vector per item (user behavior, category, price, description embedding), find the k closest items to a given one, return them as recommendations.

Because the space is high-dimensional, cosine similarity replaces Euclidean. An ANN library (FAISS or HNSW) brings query latency over millions of items down to milliseconds. No training step; adding a new item only inserts its vector. The recommendation system grows continuously in production.

Code examples
scikit-learn · k-NN classificationPython
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

# Scale mismatch kills k-NN — scaler is mandatory
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("knn", KNeighborsClassifier(n_neighbors=7, weights="distance")),
])

for k in [1, 3, 5, 7, 11, 21]:
    pipe.set_params(knn__n_neighbors=k)
    scores = cross_val_score(pipe, X, y, cv=5, scoring="f1_macro")
    print(f"k={k:2d}  F1={scores.mean():.3f} ± {scores.std():.3f}")
When to use
  • Small-to-medium dataset where neighbor search is fast
  • Nonlinear, irregular class boundaries
  • Online updating without retraining (just add the new point)
  • Quick baseline or recommender
When not to use
  • Large data with low-latency requirements — every prediction scans the dataset
  • Very high dimensions where distances flatten out (curse of dimensionality)
  • Features at different scales without normalization
Common pitfalls

Forgetting to scale

If one feature is 0–1 and another 0–1M, only the big one matters. Always scale first.

Bad k choice

k=1 is noise-sensitive; k=N predicts the global majority and is meaningless. Cross-validate over odd values.

Curse of dimensionality

In hundreds of dimensions points all look equally far. Reduce dimensions with PCA or UMAP first; accuracy often jumps.