AI Dictionary
Beginner· ~2 min read#ml#fundamentals#beginner

Machine Learning

ML

Letting computers learn patterns from data instead of being explicitly programmed.

LEARNING FROM DATAyxdata pointslearned functionthe model finds the rule that fits the examples
Definition

In classical programming you write the rules and the computer applies them. In machine learning it's the opposite: you provide examples (input → output) and the computer figures out the rule on its own.

Three main flavors: supervised (labeled data — e.g. spam/ham), unsupervised (find structure — clustering, anomaly detection), reinforcement (trial-and-error with reward — games, robotics).

The core of ML: define a loss function, then tune the model's parameters to minimize that loss (usually with gradient descent). This loop is called training.

Analogy

Like teaching a child what a "dog" is. You point at dozens of dogs and say the word. Later, on the street, they recognize a new dog. You never enumerate "four legs, tail, barks..." — they learn from examples. ML does the same.

Real-world example

Netflix's recommender system: 200M users' watch histories. You watched Inception and gave it 5 stars. The model finds "users similar to this one also loved Tenet" and recommends it.

No engineer wrote "if you watched Inception, suggest Tenet." The model derives that from millions of clicks-ratings-views. In production it re-learns every day with each user's new behavior.

When to use
  • The pattern is complex and hand-written rules are impossible (vision, NLP)
  • You have plenty of labeled data — ML loves data
  • The environment keeps changing — models can be retrained; rules can't
  • Decisions that beat humans on speed/accuracy (credit scoring, weather)
When not to use
  • Data is sparse or low-quality — garbage in, garbage out
  • Explainability is critical (loan denial, medical decisions) — many ML models are black boxes
  • Business rules are simple and fixed — if amount > 10000 then flag is better
  • One-off decisions — building a model takes months
Common pitfalls

Overfitting

The model memorizes training data and flops in production. Training accuracy 99%, test accuracy 60% = overfit. Fixes: validation set, regularization, more data.

Data leakage

Future info leaks into training data. E.g. predicting 'will this customer take a loan?' and accidentally including 'loan start date' as a feature → 100% accuracy in training, useless live.

Bias amplification

Historical bias in data (e.g. mostly male engineers hired) gets learned and locked in by the model. Fair model = fair data + fair evaluation.