Intermediate· ~2 min read#anomaly-detection#outlier#fraud

Anomaly Detection

Spotting the unusual

Automatically finding rare, unexpected examples in data — the backbone of fraud detection, server monitoring, health anomalies, network intrusion.

Definition

Anomaly detection focuses on rare, off-pattern items in mostly-normal data. Unlike standard classification, the abnormal class is extremely rare (sometimes <0.1%), labels are unreliable, and you cannot enumerate every kind of anomaly in advance. Specialized approaches are needed.

Three strategy families:

- Statistical: z-scores, modified z-scores, IQR rules. Univariate, simple, explainable. Fails in high dimensions. - Unsupervised ML: Isolation Forest, One-Class SVM, LOF, DBSCAN noise labels. Great for high-dim data, no labels required. - Semi/supervised: if some labels exist, use class weighting, SMOTE, focal loss with gradient boosting. - Deep learning: autoencoder reconstruction error, VAE, GAN-based approaches. Strong for images, audio, sequences.

Crucially: "anomaly" is context-dependent. CPU at 95% is an anomaly for a web server, normal for a Friday-night game server. Without business input there's no threshold.

Analogy

A customs officer at a port. 50,000 containers pass per day; most come from expected shippers carrying expected goods. The officer notices the odd one: a container marked "cotton" weighing twice the norm; a "furniture" shipment from a port no record exists for; a route with three unusual hops. They don't know all the rules ahead of time — they recognize deviations from expectation. Anomaly detection is the math behind that intuition.

Real-world example

A cloud provider monitors 100,000 servers per second: CPU, memory, disk, network, error rate. Which server is about to fail?

First attempt: threshold rules (CPU > 90%, errors > 100/h). Result: 5,000 alarms a day, 95% false. Alarm fatigue — everyone ignores them.

Better: an Isolation Forest computes a per-server vector of 8 features; high score if it's far from historical clusters. Add per-server baselines so anomaly = deviation from this server's normal, not absolute. Down to 50 alarms a day, 80% real. Trust restored.

Code examples

Isolation Forest · unsupervisedPython

from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

X_scaled = StandardScaler().fit_transform(X)

iso = IsolationForest(
    n_estimators=200,
    contamination=0.01,    # ~1% anomalies expected
    random_state=42,
)
iso.fit(X_scaled)

labels = iso.predict(X_scaled)   # -1 anomaly, 1 normal
scores = iso.score_samples(X_scaled)  # lower = more anomalous

worst = scores.argsort()[:100]

Autoencoder · reconstruction errorPython

import torch
import torch.nn as nn

class AE(nn.Module):
    def __init__(self, dim=20, latent=4):
        super().__init__()
        self.enc = nn.Sequential(nn.Linear(dim, 16), nn.ReLU(), nn.Linear(16, latent))
        self.dec = nn.Sequential(nn.Linear(latent, 16), nn.ReLU(), nn.Linear(16, dim))

    def forward(self, x):
        z = self.enc(x)
        return self.dec(z)

# Train on NORMAL data only
# Anomalies → high reconstruction error
model = AE().train()
for x in normal_loader:
    recon = model(x)
    loss = nn.functional.mse_loss(recon, x)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

model.eval()
with torch.no_grad():
    errors = ((model(X_test) - X_test) ** 2).mean(dim=1)
    threshold = errors.quantile(0.99)
    anomalies = errors > threshold

When to use

Positive class very rare (<1%), supervised models struggle
Few or no labels — must learn 'normal' from data
Server telemetry, fraud, network traffic, sensor data
Need to flag novel anomaly types you haven't seen

When not to use

Balanced classes — straight classification is more efficient
Definition of 'anomaly' isn't clear — start with business definition
Strict explainability and the chosen model is too opaque

Common pitfalls

Threshold without business input

The 99th percentile of scores isn't 'anomalous' — it's a percentile. Without weighing alarm cost vs miss cost, no threshold.

Drift leakage

What's 'normal' in training shifts in production. Holidays, releases, seasonality move the baseline. Retrain regularly.

Single global threshold

100K servers with one threshold is meaningless — each has its own normal. Per-entity baselines + deviation are far more accurate.