Intermediate· ~2 min read#recommendation#collaborative-filtering#personalization

Recommendation System

Personalization at scale

A system that surfaces the next item, content, or product a user is likely to want — the spine of Netflix, Spotify, Amazon, YouTube.

Definition

Recommenders solve "out of millions of options, which 10 are most relevant for this user?". Three main approaches:

- Content-based: uses item features. The user liked action films → recommend other action films with similar attributes. Resilient to cold-start; rarely surprising. - Collaborative filtering: "users like you liked this". User-based (find similar users) or item-based (find items co-liked with what you like). Matrix factorization (SVD, ALS) is the classic. - Hybrid: the modern default. Add deep learning on top (Two-Tower models, transformer-based recommenders, sequence-aware models).

Quality isn't only accuracy. Diversity (don't trap the user in a niche), novelty, serendipity, and fairness (filter bubbles, exposure for small creators) are tracked metrics. The behavior of the entire list is judged, not a single prediction.

Analogy

A skilled bookseller. You bought a thriller; they remember. Next visit, they either suggest the new release by an author you liked (content-based), or "thriller readers like you have been into a Japanese writer lately" (collaborative). A good bookseller balances both — never the same author every time, never something completely off-target.

Real-world example

A music app's "Made for you" runs in three stages:

1. Candidate generation: out of 50M songs, fetch the 1000 nearest neighbors of the user vector via ANN (FAISS/HNSW) in milliseconds. 2. Ranking: 1000 candidates scored 0–1 by gradient boosting (or a ranking transformer) using user × song features. Top 50 selected. 3. Re-ranking: apply diversity (no two songs by the same artist back-to-back), novelty (boost unheard songs), business rules (promote new releases). Output: 30 songs.

This three-stage architecture (candidates / rank / re-rank) is standard at major platforms. Training data: implicit signals (clicks, listens, skips) and explicit (likes, saves, shares).

Code examples

Implicit · ALS collaborative filteringPython

import scipy.sparse as sp
from implicit.als import AlternatingLeastSquares

# Sparse matrix: users × items, cells = interaction count
user_item = sp.csr_matrix(...)

model = AlternatingLeastSquares(
    factors=64,
    regularization=0.01,
    iterations=20,
    use_gpu=False,
)
model.fit(user_item)

# Top 10 recs for user 42
recs = model.recommend(42, user_item[42], N=10)
for item_id, score in recs:
    print(f"item {item_id}: {score:.3f}")

# Item-item similarity
similar = model.similar_items(item_id=123, N=5)

Two-Tower model (PyTorch sketch)Python

import torch
import torch.nn as nn

class TwoTower(nn.Module):
    def __init__(self, n_users, n_items, dim=64):
        super().__init__()
        self.user_emb = nn.Embedding(n_users, dim)
        self.item_emb = nn.Embedding(n_items, dim)
        self.user_mlp = nn.Sequential(nn.Linear(dim, dim), nn.ReLU(), nn.Linear(dim, dim))
        self.item_mlp = nn.Sequential(nn.Linear(dim, dim), nn.ReLU(), nn.Linear(dim, dim))

    def forward(self, user_ids, item_ids):
        u = self.user_mlp(self.user_emb(user_ids))
        i = self.item_mlp(self.item_emb(item_ids))
        return (u * i).sum(dim=-1)  # cosine or dot product

# After training, item vectors go into an ANN index;
# at runtime each user query maps to N nearest items in milliseconds.

When to use

Many items (thousands+) and you need personalization
Plenty of interaction data (clicks, watches, purchases)
Personalizing home, email, push
Cross-sell, related items, 'you might also like'

When not to use

Few items (10–50) — explicit rules are clearer and effective
Insufficient interaction data — cold-start dominates
Strict fairness/diversity requirements without baseline evaluation

Common pitfalls

Filter bubble / reinforcement

Recommending only what was previously liked traps users in narrow tastes. Diversity and serendipity metrics are necessary balances.

Cold-start

New user / new item have no history. Fall back to content features, popular items, fast onboarding to gather signal.

Optimizing offline metrics

High AUC, NDCG, MAP doesn't guarantee real clicks. Online A/B testing on CTR, conversion, retention is the only honest measure.