AI Agent
An LLM that takes actions
An autonomous system that uses an LLM as its brain to call tools, plan, and reach a goal step by step.
A plain LLM returns one response. An agent: takes a goal, plans, calls tools (web search, code execution, APIs, files), reads the result, updates the plan, continues. The loop runs until the goal is met.
The core agent loop: 1. Plan: the LLM looks at the current state and decides what to do. 2. Act: calls a tool (function calling). 3. Observe: reads the tool's output. 4. Reflect: are we closer to the goal, or do we need a different path? 5. Repeat.
Frameworks: LangGraph, CrewAI, AutoGen, Anthropic's Computer Use. Production examples: Claude Code, Cursor Agent, GitHub Copilot Workspace, Devin, Manus, most enterprise "AI assistants."
LLM = a consultant giving you one sentence of advice. Agent = an intern who actually goes and does it — opens the file, picks up the phone, writes the email, double-checks, and reports back "done." Sometimes turns down the wrong street, but doesn't forget the goal.
"Run all tests in this repo, fix the failing ones, open a PR" →
Claude Code (an agent):
1. Runs npm test via the bash tool.
2. Sees 3 tests failed.
3. Reads each affected file with the read tool.
4. Spots the bug, fixes it with the edit tool.
5. Runs npm test again — all green.
6. git commit + gh pr create to open the PR.
7. Tells the user "done, PR #123 is up."
No intermediate commands typed by the user — 10+ steps in a plan + tool-call + observation loop. Every step is an LLM call; every tool call hits an external system.
- Multi-step tasks (planning required)
- Open-ended problems — the user can't enumerate every step
- Tool use is essential (code execution, APIs, web search)
- Long-horizon tasks — workflows that may run for hours
- Single-shot Q&A — agent overhead is unnecessary
- Security-critical actions (payment, deletion) — don't let the agent decide; require human approval
- High volume + low margin — every step is an LLM call = expensive
- Narrow, deterministic workflows — a traditional script is safer
Infinite loops
When unable to reach the goal, the agent calls the same tool repeatedly. Cap turns (usually 20–50), cap budget ($), add loop detection.
Side effects from wrong tool calls
An agent might call 'send 1000 orders to the shipping API'. Add guardrails, dry-run mode, human-in-the-loop confirmation for risky tools.
Lack of visibility
If you can't see what the agent did, debugging is impossible. Log plan, tool call, observation — every step. Stream them to the user.