AI Dictionary
Beginner· ~2 min read#llm#language-model

LLM

Large Language Model

A massive AI model trained on huge text corpora that learns to predict the next token and can hold conversations.

promptINPUTLLMbillions of paramsNEXT-TOKEN PREDICTOROUTPUT
Definition

An LLM is a neural network with billions of parameters. Its single core job: probabilistically predict the next token given some text. When it does that well enough, capabilities like answering, writing, and coding emerge.

Almost all are built on the Transformer architecture. Training is typically three-stage: pretraining on trillions of tokens (learn language structure), fine-tuning on human examples, then RLHF to align with human preferences.

Major families: GPT (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta — open weights), DeepSeek, Mistral, Qwen. Sizes range 2B–2T parameters; performance depends on size, data, and training tricks.

Analogy

Think of phone keyboard autocomplete — that "next word" suggestion. An LLM is autocomplete a million times more powerful, with much deeper context. It can complete not just words, but paragraphs and entire functions. But it's still guessing — it has no real beliefs or facts.

Real-world example

A law firm sends a contract to Claude for summarization. The API call: system: "You are a legal assistant. Reply in Turkish, numbered." user: [10-page contract text] + "Summarize this in 5 bullets."

Claude tokenizes the input (~8000 tokens), generates each next token by probability, and produces a 5-bullet Turkish summary. There's no "understanding" — just statistical prediction. But the prediction is so good the output reads like a human's.

A deeper look
LLMs GENERATE ONE TOKEN AT A TIMESO FAR"The capital of France is"LLMPROBABILITIES0.92Paris0.05Lyon0.02Berlin0.01RomaUPDATED"The capital of France isParis" → sonraki token tahminithis loop repeats for every token until the sentence ends
When to use
  • Natural-language tasks: summary, translation, rewriting, explanation
  • Code generation/explanation (Copilot, Cursor)
  • Prototyping classification/extraction (LLM first, dedicated model later)
  • Chat assistants, support bots
  • Turning structured data into prose (reports, emails)
When not to use
  • Precise math/numeric computation (use a calculator/code-execution tool)
  • Real-time data lookups (need RAG or function calling)
  • When fully deterministic output is required (even T=0 isn't 100% guaranteed)
  • Low-latency critical paths (LLM responses are ~1–30s)
Common pitfalls

Treating LLMs as fact sources

An LLM predicts statistics from training data, not truth. 'What happened in year X?' may yield confident wrong answers. For critical info use RAG or web search.

Underestimating token cost

1000 output tokens ≈ 750 words. 10K users × 5 messages × 1000 tokens = 50M tokens daily. At GPT-4 prices that's serious money. Plan for caching and smaller models.

Believing one model fits all

Use a cheap/fast model for classification, mid-tier for chat, premium for hard tasks. Hybrid routing cuts cost ~70%.