Beginner· ~2 min read#ml#fundamentals#beginner

Machine Learning

Letting computers learn patterns from data instead of being explicitly programmed.

Definition

In classical programming you write the rules and the computer applies them. In machine learning it's the opposite: you provide examples (input → output) and the computer figures out the rule on its own.

Three main flavors: supervised (labeled data — e.g. spam/ham), unsupervised (find structure — clustering, anomaly detection), reinforcement (trial-and-error with reward — games, robotics).

The core of ML: define a loss function, then tune the model's parameters to minimize that loss (usually with gradient descent). This loop is called training.

Analogy

Like teaching a child what a "dog" is. You point at dozens of dogs and say the word. Later, on the street, they recognize a new dog. You never enumerate "four legs, tail, barks..." — they learn from examples. ML does the same.

Real-world example

Netflix's recommender system: 200M users' watch histories. You watched Inception and gave it 5 stars. The model finds "users similar to this one also loved Tenet" and recommends it.

No engineer wrote "if you watched Inception, suggest Tenet." The model derives that from millions of clicks-ratings-views. In production it re-learns every day with each user's new behavior.

When to use

The pattern is complex and hand-written rules are impossible (vision, NLP)
You have plenty of labeled data — ML loves data
The environment keeps changing — models can be retrained; rules can't
Decisions that beat humans on speed/accuracy (credit scoring, weather)

When not to use

Data is sparse or low-quality — garbage in, garbage out
Explainability is critical (loan denial, medical decisions) — many ML models are black boxes
Business rules are simple and fixed — if amount > 10000 then flag is better
One-off decisions — building a model takes months

Common pitfalls

Overfitting

The model memorizes training data and flops in production. Training accuracy 99%, test accuracy 60% = overfit. Fixes: validation set, regularization, more data.

Data leakage

Future info leaks into training data. E.g. predicting 'will this customer take a loan?' and accidentally including 'loan start date' as a feature → 100% accuracy in training, useless live.

Bias amplification

Historical bias in data (e.g. mostly male engineers hired) gets learned and locked in by the model. Fair model = fair data + fair evaluation.