AI Dictionary
Advanced· ~2 min read#gan#adversarial#generative

GAN

Generative Adversarial Network

A generative model where two neural networks (generator + discriminator) train against each other — one fakes samples, the other tries to spot fakes.

GENERATOR VS DISCRIMINATOR · ADVERSARIALGENERATORinput: random noisefake sample"fool the cop"DISCRIMINATORreal or fake?"catch the fakes"VERDICT⚖️loss = G ↑reward = D ↓each round: both get betterforger vs detective — both grow expert through competition
Definition

A GAN (Generative Adversarial Network), proposed by Ian Goodfellow in 2014, trains two models in parallel:

- Generator (G): produces fake samples from random noise. Goal: look real. - Discriminator (D): looks at a sample and says "real or fake?" Goal: catch the forger.

The two play an adversarial game: as G gets better, D must get better; as D improves, G must get more cunning. The process ends with G producing samples indistinguishable from real ones.

GANs were the standard for image generation from 2014-2020 (StyleGAN, BigGAN, Progressive GAN). Diffusion has since taken over, but GANs are still active: deepfakes, super-resolution, image-to-image translation (CycleGAN), medical imaging.

Analogy

A counterfeiter and a cop. The counterfeiter prints fake bills (G). The cop tries to catch fakes (D). As the counterfeiter improves, the cop must too; as the cop improves, the counterfeiter must. Both end up expert — the counterfeit becomes indistinguishable from real.

Real-world example

Inside NVIDIA's StyleGAN3: - Generator: starts from a 512-dim random vector, progressively upsamples to a 1024×1024 face photo. - Discriminator: looks and says "is this from the FFHQ real-faces dataset?" - After 25M iterations: G can produce uncanny-real faces published on "thispersondoesnotexist.com."

Same logic powers Cycle GAN for season transfer (summer ↔ winter), pix2pix for sketch → realistic image, BigGAN across all ImageNet categories.

When to use
  • One-shot fast generation (diffusion 30 steps, GAN 1 step)
  • Image-to-image translation (CycleGAN: style transfer)
  • Super-resolution — low → high resolution
  • Deepfake / face-swap apps
  • Training data augmentation
When not to use
  • When you need broad diversity — diffusion gives better variety
  • Text-conditional generation — diffusion is more flexible at conditioning
  • If you're new — GANs are hard to train; diffusion is more forgiving
  • Production quality bar — current SOTA is now diffusion
Common pitfalls

Mode collapse

Generator loses variety, produces the same 1-2 types over and over (1000 'different' faces are really 5 type variants). It found an easy way to fool D.

Training instability

Balancing G and D is hard. If one becomes too strong, the other can't learn. Hyperparameter tuning is an art — spectral normalization, gradient penalty fixes are typical.

Evaluation is hard

Measuring 'is this GAN good?' is tricky. FID (Fréchet Inception Distance) and IS (Inception Score) are standard but imperfect metrics. Human evaluation usually still needed.