Beginner· ~2 min read#temperature#sampling#inference

Temperature

The randomness dial

A numeric parameter (typically 0–2) that controls how consistent (deterministic) or varied (creative) the LLM's output is.

Definition

When generating each token, an LLM assigns probabilities to all possible next tokens. After "Once upon a" → "time" (~70%), "morning" (~5%), "fish" (~0.1%). Temperature reshapes that distribution.

0 = always pick the most likely token (deterministic). Same prompt → same output. 0.7 = mix probabilities moderately (balanced). 1.5+ = boldly sample even low-probability tokens (creative, unpredictable).

Mathematically it's a divisor applied to softmax: probabilities get rescaled by prob^(1/T). Often used with top-p (nucleus) and top-k sampling.

Analogy

Think of a DJ's mixer. 0 = single track, same song forever. 0.7 = multiple tracks, smooth mix. 1.5 = random samples from 30 channels, experimental but sometimes chaotic. Same library, different output.

Real-world example

You're building a classification API: tag the user message as "spam | normal | urgent". Use T = 0 — the same input must always produce the same label, otherwise you can't write tests.

Creative writing: suggest blog intros. Use T = 0.8 — each call yields a fresh opening with different tone. With T = 0 you'd get the same cliché every time.

When to use

Deterministic output required: classification, extraction, structured output → T ≈ 0
Factual Q&A → T ≈ 0.2
General chat, help, explanation → T ≈ 0.7
Creative writing, brainstorm, variety → T ≈ 0.9–1.2

When not to use

Setting T = 0 and complaining 'why is it always the same answer' (that's the point)
Setting T = 2 and complaining 'the answer is gibberish' (too high = nonsense token stream)
Treating it as a 'creativity' knob — creative ideas come from prompts and examples, T is just a fine-tuning

Common pitfalls

Temperature ≠ creativity

Higher T gives more varied output, not more creative output. For creative results, use a strong prompt with examples first; tweak T only after.

Forgetting seed for reproducibility

T = 0 alone isn't enough on some models — you also need a seed parameter (OpenAI supports it, Anthropic doesn't).

Hallucinations spike at high T

T > 1.0 pushes the model toward low-probability (often wrong) tokens. Hallucination rate climbs dramatically.