k-Nearest Neighbors — Explained

Definition

k-NN arguably has no training step; it stores the training data. At prediction time it finds the k closest points (usually by Euclidean distance) and returns the majority label for classification or the mean label for regression. That's why k-NN is called a lazy learner — work is deferred to inference.

Three knobs matter most: k (neighbor count), distance metric (Euclidean, Manhattan, cosine), and feature scaling. Tiny k → noisy decisions (overfit), large k → smooth, sometimes over-smooth (underfit). Pick odd k for binary classification to avoid ties.

k-NN makes no assumptions about data shape, captures nonlinear boundaries, and works for both classification and regression. Its cost is at inference time: every new point has to be compared to the entire training set. KD-Tree, Ball-Tree, and Approximate Nearest Neighbor (ANN) libraries make it practical at scale.

Analogy

A new restaurant opens on your block — is it good? You check the five most-similar restaurants nearby; if four are well-rated, this one probably is too. You didn't derive a "restaurant goodness law"; you reasoned from the closest examples. k-NN does exactly that — no rules, just neighborhoods.

Real-world example

An e-commerce site wants "people who liked this also liked" recommendations. Classic collaborative filtering is essentially a k-NN variant: build a vector per item (user behavior, category, price, description embedding), find the k closest items to a given one, return them as recommendations.

Because the space is high-dimensional, cosine similarity replaces Euclidean. An ANN library (FAISS or HNSW) brings query latency over millions of items down to milliseconds. No training step; adding a new item only inserts its vector. The recommendation system grows continuously in production.

Code examples

scikit-learn · k-NN classificationPython

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score

# Scale mismatch kills k-NN — scaler is mandatory
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("knn", KNeighborsClassifier(n_neighbors=7, weights="distance")),
])

for k in [1, 3, 5, 7, 11, 21]:
    pipe.set_params(knn__n_neighbors=k)
    scores = cross_val_score(pipe, X, y, cv=5, scoring="f1_macro")
    print(f"k={k:2d}  F1={scores.mean():.3f} ± {scores.std():.3f}")

When to use

Small-to-medium dataset where neighbor search is fast
Nonlinear, irregular class boundaries
Online updating without retraining (just add the new point)
Quick baseline or recommender

When not to use

Large data with low-latency requirements — every prediction scans the dataset
Very high dimensions where distances flatten out (curse of dimensionality)
Features at different scales without normalization

Common pitfalls

Forgetting to scale

If one feature is 0–1 and another 0–1M, only the big one matters. Always scale first.

Bad k choice

k=1 is noise-sensitive; k=N predicts the global majority and is meaningless. Cross-validate over odd values.

Curse of dimensionality

In hundreds of dimensions points all look equally far. Reduce dimensions with PCA or UMAP first; accuracy often jumps.