AI Dictionary
Beginner· ~2 min read#temperature#sampling#inference

Temperature

The randomness dial

A numeric parameter (typically 0–2) that controls how consistent (deterministic) or varied (creative) the LLM's output is.

RANDOMNESS DIAL0.00.71.5+deterministicbalancedcreative · riskyT = 0.0"Once upon a time""Once upon a time"T = 0.7"Once upon a time""Long ago, deep in…"T = 1.5"Wherein silver moths""Beyond the velvet sky"same prompt — different randomness
Definition

When generating each token, an LLM assigns probabilities to all possible next tokens. After "Once upon a" → "time" (~70%), "morning" (~5%), "fish" (~0.1%). Temperature reshapes that distribution.

0 = always pick the most likely token (deterministic). Same prompt → same output. 0.7 = mix probabilities moderately (balanced). 1.5+ = boldly sample even low-probability tokens (creative, unpredictable).

Mathematically it's a divisor applied to softmax: probabilities get rescaled by prob^(1/T). Often used with top-p (nucleus) and top-k sampling.

Analogy

Think of a DJ's mixer. 0 = single track, same song forever. 0.7 = multiple tracks, smooth mix. 1.5 = random samples from 30 channels, experimental but sometimes chaotic. Same library, different output.

Real-world example

You're building a classification API: tag the user message as "spam | normal | urgent". Use T = 0 — the same input must always produce the same label, otherwise you can't write tests.

Creative writing: suggest blog intros. Use T = 0.8 — each call yields a fresh opening with different tone. With T = 0 you'd get the same cliché every time.

When to use
  • Deterministic output required: classification, extraction, structured output → T ≈ 0
  • Factual Q&A → T ≈ 0.2
  • General chat, help, explanation → T ≈ 0.7
  • Creative writing, brainstorm, variety → T ≈ 0.9–1.2
When not to use
  • Setting T = 0 and complaining 'why is it always the same answer' (that's the point)
  • Setting T = 2 and complaining 'the answer is gibberish' (too high = nonsense token stream)
  • Treating it as a 'creativity' knob — creative ideas come from prompts and examples, T is just a fine-tuning
Common pitfalls

Temperature ≠ creativity

Higher T gives more varied output, not more creative output. For creative results, use a strong prompt with examples first; tweak T only after.

Forgetting seed for reproducibility

T = 0 alone isn't enough on some models — you also need a seed parameter (OpenAI supports it, Anthropic doesn't).

Hallucinations spike at high T

T > 1.0 pushes the model toward low-probability (often wrong) tokens. Hallucination rate climbs dramatically.