AI Dictionary
Intermediate· ~2 min read#cot#reasoning#prompting

Chain-of-Thought

CoT — Step-by-Step Reasoning

A prompting technique that encourages the model to write out intermediate reasoning steps before producing the final answer.

SHOW YOUR WORK"Roger has 5 balls. He buys 2 cans of 3 balls. How many now?"THINKING1. He starts with 5 balls.2. He buys 2 cans × 3 balls each = 6 new balls.3. Total = 5 + 6 = 11.Answer: 11 balls. ✓adding "let's think step by step" alone boosts accuracy significantly
Definition

Ask an LLM a complex math question and demand a direct answer — it'll usually get it wrong. Tell the same model to "think step by step" and accuracy jumps dramatically. This is Chain-of-Thought (CoT) prompting, discovered by Google researchers in 2022.

The mechanism: the model emits "thought" steps as tokens. Each step becomes context that shapes the next. Instead of doing arithmetic in its head, it's like writing on scratch paper.

Two flavors: few-shot CoT (include an example reasoning chain in the prompt) and zero-shot CoT (just append "Let's think step by step"). Zero-shot works surprisingly well.

Analogy

Math test: "do it in your head" → ~40% accuracy. "Show your work on scratch paper" → ~85% accuracy. Writing steps lets you catch mistakes — same reason works for the model.

Real-world example

Question: "Roger has 5 balls. He buys 2 cans of 3 balls. How many now?"

Direct answer: "10" (wrong)

CoT answer: - Roger has 5 balls. - 2 cans × 3 balls = 6 new balls. - Total: 5 + 6 = 11. - Answer: 11

Same model, same question. The only change: adding "let's think step by step." On GSM8K math benchmark this trick alone moves accuracy from ~18% to ~58%.

When to use
  • Multi-step math or logic problems
  • Code debugging — when the model needs to walk through execution
  • Decision trees, conditional logic (if-then inferences)
  • Breaking down complex instructions (first X, then Y, then Z)
When not to use
  • Simple factual questions ('capital of England?') — wasteful
  • When you need a very short output (classification, single word)
  • Latency-critical paths — CoT means 3–10× longer output = more cost + slower
Common pitfalls

Redundant on reasoning models

Models like o1 or Claude reasoning already do CoT internally. Telling them to 'think step by step' on top sometimes backfires.

Steps can be wrong while the answer is right

The model can confabulate steps and still happen to land on the right answer. Don't treat CoT as a guarantee — verification still matters.

Token budget explosion

Each CoT answer adds 200–1000 tokens. In production, hide reasoning steps and surface only the final answer if UX matters.