AI Atlas
Intermediate· ~2 min read#anomaly-detection#outlier#fraud

Anomaly Detection

Spotting the unusual

Automatically finding rare, unexpected examples in data — the backbone of fraud detection, server monitoring, health anomalies, network intrusion.

ANOMALY DETECTIONnormalanomalyPoints outside the dense region are flagged.
Definition

Anomaly detection focuses on rare, off-pattern items in mostly-normal data. Unlike standard classification, the abnormal class is extremely rare (sometimes <0.1%), labels are unreliable, and you cannot enumerate every kind of anomaly in advance. Specialized approaches are needed.

Three strategy families:

- Statistical: z-scores, modified z-scores, IQR rules. Univariate, simple, explainable. Fails in high dimensions. - Unsupervised ML: Isolation Forest, One-Class SVM, LOF, DBSCAN noise labels. Great for high-dim data, no labels required. - Semi/supervised: if some labels exist, use class weighting, SMOTE, focal loss with gradient boosting. - Deep learning: autoencoder reconstruction error, VAE, GAN-based approaches. Strong for images, audio, sequences.

Crucially: "anomaly" is context-dependent. CPU at 95% is an anomaly for a web server, normal for a Friday-night game server. Without business input there's no threshold.

Analogy

A customs officer at a port. 50,000 containers pass per day; most come from expected shippers carrying expected goods. The officer notices the odd one: a container marked "cotton" weighing twice the norm; a "furniture" shipment from a port no record exists for; a route with three unusual hops. They don't know all the rules ahead of time — they recognize deviations from expectation. Anomaly detection is the math behind that intuition.

Real-world example

A cloud provider monitors 100,000 servers per second: CPU, memory, disk, network, error rate. Which server is about to fail?

First attempt: threshold rules (CPU > 90%, errors > 100/h). Result: 5,000 alarms a day, 95% false. Alarm fatigue — everyone ignores them.

Better: an Isolation Forest computes a per-server vector of 8 features; high score if it's far from historical clusters. Add per-server baselines so anomaly = deviation from this server's normal, not absolute. Down to 50 alarms a day, 80% real. Trust restored.

Code examples
Isolation Forest · unsupervisedPython
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

X_scaled = StandardScaler().fit_transform(X)

iso = IsolationForest(
    n_estimators=200,
    contamination=0.01,    # ~1% anomalies expected
    random_state=42,
)
iso.fit(X_scaled)

labels = iso.predict(X_scaled)   # -1 anomaly, 1 normal
scores = iso.score_samples(X_scaled)  # lower = more anomalous

worst = scores.argsort()[:100]
Autoencoder · reconstruction errorPython
import torch
import torch.nn as nn

class AE(nn.Module):
    def __init__(self, dim=20, latent=4):
        super().__init__()
        self.enc = nn.Sequential(nn.Linear(dim, 16), nn.ReLU(), nn.Linear(16, latent))
        self.dec = nn.Sequential(nn.Linear(latent, 16), nn.ReLU(), nn.Linear(16, dim))

    def forward(self, x):
        z = self.enc(x)
        return self.dec(z)

# Train on NORMAL data only
# Anomalies → high reconstruction error
model = AE().train()
for x in normal_loader:
    recon = model(x)
    loss = nn.functional.mse_loss(recon, x)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

model.eval()
with torch.no_grad():
    errors = ((model(X_test) - X_test) ** 2).mean(dim=1)
    threshold = errors.quantile(0.99)
    anomalies = errors > threshold
When to use
  • Positive class very rare (<1%), supervised models struggle
  • Few or no labels — must learn 'normal' from data
  • Server telemetry, fraud, network traffic, sensor data
  • Need to flag novel anomaly types you haven't seen
When not to use
  • Balanced classes — straight classification is more efficient
  • Definition of 'anomaly' isn't clear — start with business definition
  • Strict explainability and the chosen model is too opaque
Common pitfalls

Threshold without business input

The 99th percentile of scores isn't 'anomalous' — it's a percentile. Without weighing alarm cost vs miss cost, no threshold.

Drift leakage

What's 'normal' in training shifts in production. Holidays, releases, seasonality move the baseline. Retrain regularly.

Single global threshold

100K servers with one threshold is meaningless — each has its own normal. Per-entity baselines + deviation are far more accurate.