GAN
Generative Adversarial Network
A generative model where two neural networks (generator + discriminator) train against each other — one fakes samples, the other tries to spot fakes.
A GAN (Generative Adversarial Network), proposed by Ian Goodfellow in 2014, trains two models in parallel:
- Generator (G): produces fake samples from random noise. Goal: look real. - Discriminator (D): looks at a sample and says "real or fake?" Goal: catch the forger.
The two play an adversarial game: as G gets better, D must get better; as D improves, G must get more cunning. The process ends with G producing samples indistinguishable from real ones.
GANs were the standard for image generation from 2014-2020 (StyleGAN, BigGAN, Progressive GAN). Diffusion has since taken over, but GANs are still active: deepfakes, super-resolution, image-to-image translation (CycleGAN), medical imaging.
A counterfeiter and a cop. The counterfeiter prints fake bills (G). The cop tries to catch fakes (D). As the counterfeiter improves, the cop must too; as the cop improves, the counterfeiter must. Both end up expert — the counterfeit becomes indistinguishable from real.
Inside NVIDIA's StyleGAN3: - Generator: starts from a 512-dim random vector, progressively upsamples to a 1024×1024 face photo. - Discriminator: looks and says "is this from the FFHQ real-faces dataset?" - After 25M iterations: G can produce uncanny-real faces published on "thispersondoesnotexist.com."
Same logic powers Cycle GAN for season transfer (summer ↔ winter), pix2pix for sketch → realistic image, BigGAN across all ImageNet categories.
- One-shot fast generation (diffusion 30 steps, GAN 1 step)
- Image-to-image translation (CycleGAN: style transfer)
- Super-resolution — low → high resolution
- Deepfake / face-swap apps
- Training data augmentation
- When you need broad diversity — diffusion gives better variety
- Text-conditional generation — diffusion is more flexible at conditioning
- If you're new — GANs are hard to train; diffusion is more forgiving
- Production quality bar — current SOTA is now diffusion
Mode collapse
Generator loses variety, produces the same 1-2 types over and over (1000 'different' faces are really 5 type variants). It found an easy way to fool D.
Training instability
Balancing G and D is hard. If one becomes too strong, the other can't learn. Hyperparameter tuning is an art — spectral normalization, gradient penalty fixes are typical.
Evaluation is hard
Measuring 'is this GAN good?' is tricky. FID (Fréchet Inception Distance) and IS (Inception Score) are standard but imperfect metrics. Human evaluation usually still needed.