🧭GUIDE

Do you need AI, ML, or just an LLM?

How to decide between AI, classical machine learning, and a large language model for a new feature — a decision flow with real scenarios.

DecisionAIMLLLM

The setup: every hammer looks the same

Most "let's add AI" meetings stumble because three concepts get conflated: AI, machine learning (ML), and large language models (LLMs). They're nested:

AI is the broadest umbrella — anything making a computer act "intelligent", including rule systems.
ML is a subset of AI: systems that learn from data.
LLMs are a subset of ML (specifically deep learning): models trained on huge text corpora that predict language (GPT, Claude, Gemini).

This guide makes the choice concrete when designing a feature. Often the right answer isn't "all three" — it's "the simplest one that works", and it's frequently not even an LLM.

Decision flow

Walk through these in order:

Q1: Does a fixed rule solve it?

"Flag for manual review if order amount > 10,000 TRY."

This isn't AI/ML/LLM — it's a simple if/else. Two lines committed by an engineer. You're applying known business logic, not predicting.

Rule: if business logic is clear, fixed, and easy to write, don't touch AI. Fewer surprises, easier tests, lower cost.

Q2: Predicting something numeric/categorical?

"Will this user churn in 30 days?" "What's the home's sale price?" "Is this transaction fraud?"

This is classical ML territory. Train a model on labeled history (gradient boosting, logistic regression, random forest); it captures structure in tabular data and generalizes.

Rule: with historical data and a clear target on tabular structure, ML was built for this. Calling an LLM is 1,000× more expensive and usually less accurate.

Q3: Unstructured data — images, audio?

"Identify objects in this photo." "Transcribe this speech." "Is there a tumor in this MRI?"

Classical territory of deep learning — specialized architectures (CNN, RNN, Vision Transformer, Whisper). Use pre-trained models when possible.

Rule: for images, audio, video, deep learning. LLMs are rarely the best choice; multimodal LLMs are an expensive general-purpose fallback.

Q4: Natural language?

"Summarize this customer review." "Answer questions over this long doc." "Translate this from English to Turkish." "Understand intent and route the workflow."

This is the sweet spot for LLMs. Decades of NLP research live behind GPT/Claude/Gemini for understanding, generating, transforming language. API call beats training your own in 95% of scenarios.

But: if you need a structured answer (JSON, class label, number), evaluate plain ML first. Speed, cost, explainability often favor ML.

Q5: Coordinating multiple steps?

"Understand the question, query the DB, summarize the result, draft an email."

This is AI agents territory. The LLM no longer produces one answer — it manages a chain of decisions: which tool to call, how to interpret results, what to do next. MCP servers, function calling, ReAct loops are the building blocks.

Rule: multi-step, conditional flows? Consider an LLM agent. But first: is this really an agent, or would a plain automation (cron + Lambda + API) do? Agents are expensive, error-prone, and hard to debug — pick cron when cron suffices.

Walk through real scenarios

Scenario A: e-commerce search box

Need: user types "red summer dress size 36"; results should be relevant.

Wrong answer: "send to GPT to extract intent". Costly and slow.

Right answer: classical search (BM25 or OpenSearch) + product filters + (optional) embedding-based semantic search. ML, mostly classical IR.

When does the LLM enter? For abstract queries like "practical T-shirts for new moms", an LLM step can interpret intent. But the core search shouldn't be an LLM.

Scenario B: customer-support bot

Need: user asks a question, the bot answers from product docs.

Right answer: RAG — embed docs into a vector DB, retrieve relevant chunks, LLM drafts the answer. The LLM is in its element: language understanding + generation. Classical ML can't do this; pre-trained transformers are the right tool.

Scenario C: credit-application scoring

Need: score new applications as low/medium/high risk.

Right answer: classical ML — gradient boosting or logistic regression. High accuracy, low latency, explainable (regulatory must). LLMs are slow and indefensible to a regulator.

Scenario D: PDF summarizer

Need: turn a 50-page report into one page.

Right answer: LLM, directly. Summarization is exactly what LLMs are great at. API call, write a prompt, done.

Scenario E: fraud detection

Need: score 10K transactions per second in real-time.

Right answer: classical ML (gradient boosting). Sub-millisecond predictions, high accuracy, explicit feature importance. LLM's 200ms latency is impossible at that scale.

Scenario F: personal-assistant agent

Need: "find a restaurant tomorrow at 10 if I'm free, add to my calendar, text my partner."

Right answer: LLM agent + MCP servers (calendar, SMS, restaurant API). Multi-step, conditional, natural-language interface — exactly what agents were designed for.

Quick decision table

Task	Pick
Apply fixed business rule	if/else
Tabular prediction (number/class)	Classical ML
Image / audio recognition	Deep learning
NL summarize / generate / translate	LLM
Structured output (JSON, class)	ML first; LLM if necessary
Multi-step workflow	LLM agent (justified)
Doc-grounded Q&A	RAG (LLM + embeddings)
Rare/imbalanced detection	Anomaly detection
Time-based prediction	Forecasting

Cost / risk lens

Per category:

Cost: if/else free; ML one-time training + cheap inference; LLM token-priced per call.
Latency: if/else µs; ML ms; LLM 100ms–2s.
Explainability: if/else perfect; classical ML good; LLM opaque.
Regulatory: in healthcare/finance, opaque LLMs are problematic; ML preferred.
Maintenance: if/else minimal; ML retrain cycles; LLM provider drift.

Common anti-patterns

"Just use GPT for everything"

Beginner's classic. API-calling instead of an if/else, LLMs for every tabular problem — expensive and slow. Try the simplest tool first.

"A feature without AI is unprofessional"

Features don't have to include AI. When stakeholders demand "more AI", reframe to "what's the actual business problem?". AI is a tool, not a solution.

"Pre-trained model exists, but let's train our own"

Off-the-shelf is right 99% of the time. Training from scratch is expensive and slow. Custom only for genuine long-tail domain needs (and even then start with fine-tuning).

Bottom line

For any new feature, walk the flow:

Fixed rule? → if/else
Tabular prediction? → ML
Unstructured data? → deep learning
Natural language? → LLM
Multi-step workflow? → agent

This order surfaces the simplest, fittest tool first. Before reaching for AI, ask whether you actually need it.

Continue reading

AI — the broadest umbrella: making computers act "intelligent".
Machine Learning — systems that learn from data.
LLM — the modern center: large language models.
RAG — the standard way to feed knowledge without retraining.
Token Reduction Techniques — keeping LLM cost down at scale.