Recommendation System
Personalization at scale
A system that surfaces the next item, content, or product a user is likely to want — the spine of Netflix, Spotify, Amazon, YouTube.
Recommenders solve "out of millions of options, which 10 are most relevant for this user?". Three main approaches:
- Content-based: uses item features. The user liked action films → recommend other action films with similar attributes. Resilient to cold-start; rarely surprising. - Collaborative filtering: "users like you liked this". User-based (find similar users) or item-based (find items co-liked with what you like). Matrix factorization (SVD, ALS) is the classic. - Hybrid: the modern default. Add deep learning on top (Two-Tower models, transformer-based recommenders, sequence-aware models).
Quality isn't only accuracy. Diversity (don't trap the user in a niche), novelty, serendipity, and fairness (filter bubbles, exposure for small creators) are tracked metrics. The behavior of the entire list is judged, not a single prediction.
A skilled bookseller. You bought a thriller; they remember. Next visit, they either suggest the new release by an author you liked (content-based), or "thriller readers like you have been into a Japanese writer lately" (collaborative). A good bookseller balances both — never the same author every time, never something completely off-target.
A music app's "Made for you" runs in three stages:
1. Candidate generation: out of 50M songs, fetch the 1000 nearest neighbors of the user vector via ANN (FAISS/HNSW) in milliseconds. 2. Ranking: 1000 candidates scored 0–1 by gradient boosting (or a ranking transformer) using user × song features. Top 50 selected. 3. Re-ranking: apply diversity (no two songs by the same artist back-to-back), novelty (boost unheard songs), business rules (promote new releases). Output: 30 songs.
This three-stage architecture (candidates / rank / re-rank) is standard at major platforms. Training data: implicit signals (clicks, listens, skips) and explicit (likes, saves, shares).
import scipy.sparse as sp
from implicit.als import AlternatingLeastSquares
# Sparse matrix: users × items, cells = interaction count
user_item = sp.csr_matrix(...)
model = AlternatingLeastSquares(
factors=64,
regularization=0.01,
iterations=20,
use_gpu=False,
)
model.fit(user_item)
# Top 10 recs for user 42
recs = model.recommend(42, user_item[42], N=10)
for item_id, score in recs:
print(f"item {item_id}: {score:.3f}")
# Item-item similarity
similar = model.similar_items(item_id=123, N=5)import torch
import torch.nn as nn
class TwoTower(nn.Module):
def __init__(self, n_users, n_items, dim=64):
super().__init__()
self.user_emb = nn.Embedding(n_users, dim)
self.item_emb = nn.Embedding(n_items, dim)
self.user_mlp = nn.Sequential(nn.Linear(dim, dim), nn.ReLU(), nn.Linear(dim, dim))
self.item_mlp = nn.Sequential(nn.Linear(dim, dim), nn.ReLU(), nn.Linear(dim, dim))
def forward(self, user_ids, item_ids):
u = self.user_mlp(self.user_emb(user_ids))
i = self.item_mlp(self.item_emb(item_ids))
return (u * i).sum(dim=-1) # cosine or dot product
# After training, item vectors go into an ANN index;
# at runtime each user query maps to N nearest items in milliseconds.- Many items (thousands+) and you need personalization
- Plenty of interaction data (clicks, watches, purchases)
- Personalizing home, email, push
- Cross-sell, related items, 'you might also like'
- Few items (10–50) — explicit rules are clearer and effective
- Insufficient interaction data — cold-start dominates
- Strict fairness/diversity requirements without baseline evaluation
Filter bubble / reinforcement
Recommending only what was previously liked traps users in narrow tastes. Diversity and serendipity metrics are necessary balances.
Cold-start
New user / new item have no history. Fall back to content features, popular items, fast onboarding to gather signal.
Optimizing offline metrics
High AUC, NDCG, MAP doesn't guarantee real clicks. Online A/B testing on CTR, conversion, retention is the only honest measure.