Do you need AI, ML, or just an LLM?
How to decide between AI, classical machine learning, and a large language model for a new feature — a decision flow with real scenarios.
The setup: every hammer looks the same
Most "let's add AI" meetings stumble because three concepts get conflated: AI, machine learning (ML), and large language models (LLMs). They're nested:
- AI is the broadest umbrella — anything making a computer act "intelligent", including rule systems.
- ML is a subset of AI: systems that learn from data.
- LLMs are a subset of ML (specifically deep learning): models trained on huge text corpora that predict language (GPT, Claude, Gemini).
This guide makes the choice concrete when designing a feature. Often the right answer isn't "all three" — it's "the simplest one that works", and it's frequently not even an LLM.
Decision flow
Walk through these in order:
Q1: Does a fixed rule solve it?
"Flag for manual review if order amount > 10,000 TRY."
This isn't AI/ML/LLM — it's a simple if/else. Two lines committed by an engineer. You're applying known business logic, not predicting.
Rule: if business logic is clear, fixed, and easy to write, don't touch AI. Fewer surprises, easier tests, lower cost.
Q2: Predicting something numeric/categorical?
"Will this user churn in 30 days?" "What's the home's sale price?" "Is this transaction fraud?"
This is classical ML territory. Train a model on labeled history (gradient boosting, logistic regression, random forest); it captures structure in tabular data and generalizes.
Rule: with historical data and a clear target on tabular structure, ML was built for this. Calling an LLM is 1,000× more expensive and usually less accurate.
Q3: Unstructured data — images, audio?
"Identify objects in this photo." "Transcribe this speech." "Is there a tumor in this MRI?"
Classical territory of deep learning — specialized architectures (CNN, RNN, Vision Transformer, Whisper). Use pre-trained models when possible.
Rule: for images, audio, video, deep learning. LLMs are rarely the best choice; multimodal LLMs are an expensive general-purpose fallback.
Q4: Natural language?
"Summarize this customer review." "Answer questions over this long doc." "Translate this from English to Turkish." "Understand intent and route the workflow."
This is the sweet spot for LLMs. Decades of NLP research live behind GPT/Claude/Gemini for understanding, generating, transforming language. API call beats training your own in 95% of scenarios.
But: if you need a structured answer (JSON, class label, number), evaluate plain ML first. Speed, cost, explainability often favor ML.
Q5: Coordinating multiple steps?
"Understand the question, query the DB, summarize the result, draft an email."
This is AI agents territory. The LLM no longer produces one answer — it manages a chain of decisions: which tool to call, how to interpret results, what to do next. MCP servers, function calling, ReAct loops are the building blocks.
Rule: multi-step, conditional flows? Consider an LLM agent. But first: is this really an agent, or would a plain automation (cron + Lambda + API) do? Agents are expensive, error-prone, and hard to debug — pick cron when cron suffices.
Walk through real scenarios
Scenario A: e-commerce search box
Need: user types "red summer dress size 36"; results should be relevant.
Wrong answer: "send to GPT to extract intent". Costly and slow.
Right answer: classical search (BM25 or OpenSearch) + product filters + (optional) embedding-based semantic search. ML, mostly classical IR.
When does the LLM enter? For abstract queries like "practical T-shirts for new moms", an LLM step can interpret intent. But the core search shouldn't be an LLM.
Scenario B: customer-support bot
Need: user asks a question, the bot answers from product docs.
Right answer: RAG — embed docs into a vector DB, retrieve relevant chunks, LLM drafts the answer. The LLM is in its element: language understanding + generation. Classical ML can't do this; pre-trained transformers are the right tool.
Scenario C: credit-application scoring
Need: score new applications as low/medium/high risk.
Right answer: classical ML — gradient boosting or logistic regression. High accuracy, low latency, explainable (regulatory must). LLMs are slow and indefensible to a regulator.
Scenario D: PDF summarizer
Need: turn a 50-page report into one page.
Right answer: LLM, directly. Summarization is exactly what LLMs are great at. API call, write a prompt, done.
Scenario E: fraud detection
Need: score 10K transactions per second in real-time.
Right answer: classical ML (gradient boosting). Sub-millisecond predictions, high accuracy, explicit feature importance. LLM's 200ms latency is impossible at that scale.
Scenario F: personal-assistant agent
Need: "find a restaurant tomorrow at 10 if I'm free, add to my calendar, text my partner."
Right answer: LLM agent + MCP servers (calendar, SMS, restaurant API). Multi-step, conditional, natural-language interface — exactly what agents were designed for.
Quick decision table
| Task | Pick |
|---|---|
| Apply fixed business rule | if/else |
| Tabular prediction (number/class) | Classical ML |
| Image / audio recognition | Deep learning |
| NL summarize / generate / translate | LLM |
| Structured output (JSON, class) | ML first; LLM if necessary |
| Multi-step workflow | LLM agent (justified) |
| Doc-grounded Q&A | RAG (LLM + embeddings) |
| Rare/imbalanced detection | Anomaly detection |
| Time-based prediction | Forecasting |
Cost / risk lens
Per category:
- Cost: if/else free; ML one-time training + cheap inference; LLM token-priced per call.
- Latency: if/else µs; ML ms; LLM 100ms–2s.
- Explainability: if/else perfect; classical ML good; LLM opaque.
- Regulatory: in healthcare/finance, opaque LLMs are problematic; ML preferred.
- Maintenance: if/else minimal; ML retrain cycles; LLM provider drift.
Common anti-patterns
"Just use GPT for everything"
Beginner's classic. API-calling instead of an if/else, LLMs for every tabular problem — expensive and slow. Try the simplest tool first.
"A feature without AI is unprofessional"
Features don't have to include AI. When stakeholders demand "more AI", reframe to "what's the actual business problem?". AI is a tool, not a solution.
"Pre-trained model exists, but let's train our own"
Off-the-shelf is right 99% of the time. Training from scratch is expensive and slow. Custom only for genuine long-tail domain needs (and even then start with fine-tuning).
Bottom line
For any new feature, walk the flow:
- Fixed rule? → if/else
- Tabular prediction? → ML
- Unstructured data? → deep learning
- Natural language? → LLM
- Multi-step workflow? → agent
This order surfaces the simplest, fittest tool first. Before reaching for AI, ask whether you actually need it.
Continue reading
- AI — the broadest umbrella: making computers act "intelligent".
- Machine Learning — systems that learn from data.
- LLM — the modern center: large language models.
- RAG — the standard way to feed knowledge without retraining.
- Token Reduction Techniques — keeping LLM cost down at scale.