Machine Learning
ML
Letting computers learn patterns from data instead of being explicitly programmed.
In classical programming you write the rules and the computer applies them. In machine learning it's the opposite: you provide examples (input → output) and the computer figures out the rule on its own.
Three main flavors: supervised (labeled data — e.g. spam/ham), unsupervised (find structure — clustering, anomaly detection), reinforcement (trial-and-error with reward — games, robotics).
The core of ML: define a loss function, then tune the model's parameters to minimize that loss (usually with gradient descent). This loop is called training.
Like teaching a child what a "dog" is. You point at dozens of dogs and say the word. Later, on the street, they recognize a new dog. You never enumerate "four legs, tail, barks..." — they learn from examples. ML does the same.
Netflix's recommender system: 200M users' watch histories. You watched Inception and gave it 5 stars. The model finds "users similar to this one also loved Tenet" and recommends it.
No engineer wrote "if you watched Inception, suggest Tenet." The model derives that from millions of clicks-ratings-views. In production it re-learns every day with each user's new behavior.
- The pattern is complex and hand-written rules are impossible (vision, NLP)
- You have plenty of labeled data — ML loves data
- The environment keeps changing — models can be retrained; rules can't
- Decisions that beat humans on speed/accuracy (credit scoring, weather)
- Data is sparse or low-quality — garbage in, garbage out
- Explainability is critical (loan denial, medical decisions) — many ML models are black boxes
- Business rules are simple and fixed —
if amount > 10000 then flagis better - One-off decisions — building a model takes months
Overfitting
The model memorizes training data and flops in production. Training accuracy 99%, test accuracy 60% = overfit. Fixes: validation set, regularization, more data.
Data leakage
Future info leaks into training data. E.g. predicting 'will this customer take a loan?' and accidentally including 'loan start date' as a feature → 100% accuracy in training, useless live.
Bias amplification
Historical bias in data (e.g. mostly male engineers hired) gets learned and locked in by the model. Fair model = fair data + fair evaluation.