k-Nearest Neighbors
k-NN · Predict by your neighbors
For a new point, look at the k closest training examples and use their majority label (classification) or mean label (regression) — simple, surprisingly strong.
k-NN arguably has no training step; it stores the training data. At prediction time it finds the k closest points (usually by Euclidean distance) and returns the majority label for classification or the mean label for regression. That's why k-NN is called a lazy learner — work is deferred to inference.
Three knobs matter most: k (neighbor count), distance metric (Euclidean, Manhattan, cosine), and feature scaling. Tiny k → noisy decisions (overfit), large k → smooth, sometimes over-smooth (underfit). Pick odd k for binary classification to avoid ties.
k-NN makes no assumptions about data shape, captures nonlinear boundaries, and works for both classification and regression. Its cost is at inference time: every new point has to be compared to the entire training set. KD-Tree, Ball-Tree, and Approximate Nearest Neighbor (ANN) libraries make it practical at scale.
A new restaurant opens on your block — is it good? You check the five most-similar restaurants nearby; if four are well-rated, this one probably is too. You didn't derive a "restaurant goodness law"; you reasoned from the closest examples. k-NN does exactly that — no rules, just neighborhoods.
An e-commerce site wants "people who liked this also liked" recommendations. Classic collaborative filtering is essentially a k-NN variant: build a vector per item (user behavior, category, price, description embedding), find the k closest items to a given one, return them as recommendations.
Because the space is high-dimensional, cosine similarity replaces Euclidean. An ANN library (FAISS or HNSW) brings query latency over millions of items down to milliseconds. No training step; adding a new item only inserts its vector. The recommendation system grows continuously in production.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
# Scale mismatch kills k-NN — scaler is mandatory
pipe = Pipeline([
("scaler", StandardScaler()),
("knn", KNeighborsClassifier(n_neighbors=7, weights="distance")),
])
for k in [1, 3, 5, 7, 11, 21]:
pipe.set_params(knn__n_neighbors=k)
scores = cross_val_score(pipe, X, y, cv=5, scoring="f1_macro")
print(f"k={k:2d} F1={scores.mean():.3f} ± {scores.std():.3f}")- Small-to-medium dataset where neighbor search is fast
- Nonlinear, irregular class boundaries
- Online updating without retraining (just add the new point)
- Quick baseline or recommender
- Large data with low-latency requirements — every prediction scans the dataset
- Very high dimensions where distances flatten out (curse of dimensionality)
- Features at different scales without normalization
Forgetting to scale
If one feature is 0–1 and another 0–1M, only the big one matters. Always scale first.
Bad k choice
k=1 is noise-sensitive; k=N predicts the global majority and is meaningless. Cross-validate over odd values.
Curse of dimensionality
In hundreds of dimensions points all look equally far. Reduce dimensions with PCA or UMAP first; accuracy often jumps.