Intermediate· ~2 min read#function-calling#tool-use#structured-output

Function Calling

Tool Use

The LLM emits a structured JSON describing which function to call with which arguments; your app executes it and feeds the result back.

Definition

LLMs can't run code, do math, or hit the internet. But they can do one crucial thing: emit structured JSON naming a function you've defined along with arguments. You tell it "these tools exist: get_weather(city), send_email(to, body)"; it produces get_weather(city: "Paris") based on the user's request. Running it is your job.

Loop: (1) give the model the tool definitions, (2) model emits a tool_call JSON, (3) you execute the function, (4) you return the result as a tool_result, (5) the model writes the final answer. The loop may run for multiple turns — that's the core mechanic of an agent.

OpenAI, Anthropic, Google, Mistral — all support it. Parameter types use JSON Schema. Modern models can issue parallel tool calls.

Analogy

Like a phone call with a secretary. The secretary can't write reports, but can say "fetch the Q3 report from that drawer." The instruction is verbal; the action is physical. LLM = secretary, function calling = turning "fetch" into a structured command, your app = the person who actually opens the drawer.

Real-world example

A customer asks "what's the status of my order?" The bot sends the message plus a get_order_status(order_id) tool definition to the LLM. The LLM first calls ask_user(question: "What's your order number?"). The customer replies "TR-9921." Now the LLM emits get_order_status(order_id: "TR-9921"). The backend hits the real API, returns "Shipped, arriving tomorrow." The LLM wraps that in natural language: "Your order is on its way and will arrive tomorrow 🚚"

Code examples

OpenAI tools API · weather examplePython

from openai import OpenAI
import json

client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools,
)

call = resp.choices[0].message.tool_calls[0]
args = json.loads(call.function.arguments)
# → call.function.name == "get_weather"
# → args == {"city": "Paris"}

# You then call the real API, return the result as tool_result
# to the model, and it produces the final answer.

When to use

Fetching real-time data (weather, prices, stock)
Precise math (LLMs are weak at arithmetic — use a calculator tool)
Side-effect actions: send email, create record, trigger payment
When structured output is required — a JSON schema guarantees shape

When not to use

The info is already in the prompt — don't add a needless round-trip
One-shot creative content (poems, blog drafts) — no tools needed
Latency-critical UI flows — every tool turn is an extra round-trip

Common pitfalls

Stuffing too many tools

Give it 20+ tools and the model gets confused, picks the wrong one. Start with 5–10, organize hierarchically if you need more.

Skipping schema validation

The model occasionally invents parameters or gets types wrong. Always validate against the schema before executing.

Infinite loops

When the model can't reach a conclusion it'll call the same tool repeatedly. Cap turns (e.g. 10) and add loop detection.