GAN — Explained · AI Sözlüğü

Definition

A GAN (Generative Adversarial Network), proposed by Ian Goodfellow in 2014, trains two models in parallel:

- Generator (G): produces fake samples from random noise. Goal: look real. - Discriminator (D): looks at a sample and says "real or fake?" Goal: catch the forger.

The two play an adversarial game: as G gets better, D must get better; as D improves, G must get more cunning. The process ends with G producing samples indistinguishable from real ones.

GANs were the standard for image generation from 2014-2020 (StyleGAN, BigGAN, Progressive GAN). Diffusion has since taken over, but GANs are still active: deepfakes, super-resolution, image-to-image translation (CycleGAN), medical imaging.

Analogy

A counterfeiter and a cop. The counterfeiter prints fake bills (G). The cop tries to catch fakes (D). As the counterfeiter improves, the cop must too; as the cop improves, the counterfeiter must. Both end up expert — the counterfeit becomes indistinguishable from real.

Real-world example

Inside NVIDIA's StyleGAN3: - Generator: starts from a 512-dim random vector, progressively upsamples to a 1024×1024 face photo. - Discriminator: looks and says "is this from the FFHQ real-faces dataset?" - After 25M iterations: G can produce uncanny-real faces published on "thispersondoesnotexist.com."

Same logic powers Cycle GAN for season transfer (summer ↔ winter), pix2pix for sketch → realistic image, BigGAN across all ImageNet categories.

When to use

One-shot fast generation (diffusion 30 steps, GAN 1 step)
Image-to-image translation (CycleGAN: style transfer)
Super-resolution — low → high resolution
Deepfake / face-swap apps
Training data augmentation

When not to use

When you need broad diversity — diffusion gives better variety
Text-conditional generation — diffusion is more flexible at conditioning
If you're new — GANs are hard to train; diffusion is more forgiving
Production quality bar — current SOTA is now diffusion

Common pitfalls

Mode collapse

Generator loses variety, produces the same 1-2 types over and over (1000 'different' faces are really 5 type variants). It found an easy way to fool D.

Training instability

Balancing G and D is hard. If one becomes too strong, the other can't learn. Hyperparameter tuning is an art — spectral normalization, gradient penalty fixes are typical.

Evaluation is hard

Measuring 'is this GAN good?' is tricky. FID (Fréchet Inception Distance) and IS (Inception Score) are standard but imperfect metrics. Human evaluation usually still needed.