LLM — Explained · AI Sözlüğü

Definition

An LLM is a neural network with billions of parameters. Its single core job: probabilistically predict the next token given some text. When it does that well enough, capabilities like answering, writing, and coding emerge.

Almost all are built on the Transformer architecture. Training is typically three-stage: pretraining on trillions of tokens (learn language structure), fine-tuning on human examples, then RLHF to align with human preferences.

Major families: GPT (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta — open weights), DeepSeek, Mistral, Qwen. Sizes range 2B–2T parameters; performance depends on size, data, and training tricks.

Analogy

Think of phone keyboard autocomplete — that "next word" suggestion. An LLM is autocomplete a million times more powerful, with much deeper context. It can complete not just words, but paragraphs and entire functions. But it's still guessing — it has no real beliefs or facts.

Real-world example

A law firm sends a contract to Claude for summarization. The API call: system: "You are a legal assistant. Reply in Turkish, numbered." user: [10-page contract text] + "Summarize this in 5 bullets."

Claude tokenizes the input (~8000 tokens), generates each next token by probability, and produces a 5-bullet Turkish summary. There's no "understanding" — just statistical prediction. But the prediction is so good the output reads like a human's.

A deeper look

When to use

Natural-language tasks: summary, translation, rewriting, explanation
Code generation/explanation (Copilot, Cursor)
Prototyping classification/extraction (LLM first, dedicated model later)
Chat assistants, support bots
Turning structured data into prose (reports, emails)

When not to use

Precise math/numeric computation (use a calculator/code-execution tool)
Real-time data lookups (need RAG or function calling)
When fully deterministic output is required (even T=0 isn't 100% guaranteed)
Low-latency critical paths (LLM responses are ~1–30s)

Common pitfalls

Treating LLMs as fact sources

An LLM predicts statistics from training data, not truth. 'What happened in year X?' may yield confident wrong answers. For critical info use RAG or web search.

Underestimating token cost

1000 output tokens ≈ 750 words. 10K users × 5 messages × 1000 tokens = 50M tokens daily. At GPT-4 prices that's serious money. Plan for caching and smaller models.

Believing one model fits all

Use a cheap/fast model for classification, mid-tier for chat, premium for hard tasks. Hybrid routing cuts cost ~70%.