Top-p — Explained · AI Sözlüğü

Definition

For every token an LLM produces, it computes a probability distribution over the entire vocabulary (e.g. 50K tokens, each with a probability). Top-p (a.k.a. nucleus sampling) limits sampling to only the most likely tokens.

How it works: sort tokens by probability. Walk down the sorted list, summing probabilities, until the cumulative probability exceeds p. What's left is the "nucleus" — a small subset. Only sample from it.

Typical values: - top_p = 1.0: all tokens included (unbounded variety, risky) - top_p = 0.9: standard "slightly varied" (good default) - top_p = 0.5: narrow, more conservative - top_p = 0.1: very narrow, near-deterministic

Used with temperature — top_p shapes which tokens are even eligible, temperature shapes their selection probabilities.

Analogy

Like restricting a restaurant menu. Leave the whole menu open (top_p=1) and the waiter might bring you an obscure aged dish (risky). Open the 90% favorites (top_p=0.9), eliminate weird picks — still some variety. Open only the top 5 (top_p=0.1) — always chooses among the same 3-4 dishes.

Real-world example

Generating a creative story opener with GPT-4.

top_p = 1.0: "Once upon a time" / "It was a dark and stormy night" / "Wherein the silver moths of dawn..." → sometimes cliché, sometimes too weird.

top_p = 0.9: "Once upon a time" / "Long ago in a forgotten kingdom" / "She had always known..." → varied yet sensible.

top_p = 0.3: "Once upon a time" / "There was once" / "Long ago" → safe, cliché, predictable.

A good production starting point is temperature=0.7, top_p=0.9. OpenAI docs say to tune one or the other — not both at once.

When to use

When you want creative/varied output — top_p=0.9-0.95
Preventing rare weird tokens (use top_p=0.95 over 1.0)
Classification/structured output — top_p=0.1 for determinism
Long-form text — top_p stabilizes quality across length

When not to use

Aggressively combining with temperature (both at 1.5+ → chaos)
Treating it alone as a 'quality knob' — prompt + examples matter more
JSON/structured output — use low temperature + low top_p

Common pitfalls

Confusing top_p with temperature

Most guides say 'tune one, leave the other default.' Tweaking both aggressively yields unpredictable results. Even OpenAI's own docs warn about this.

top_p = 0 ≠ temperature = 0

top_p=0 doesn't mean the model picks only the top token (the math doesn't work that way). For full determinism, set temperature=0 instead.

Manual tuning on reasoning models

o1, Claude reasoning, etc. don't expose temperature/top_p — the model self-optimizes. Manual interference usually backfires.