System Prompt
The persistent instruction
A standing instruction placed at the top of every conversation that defines the model's role, tone, rules, and boundaries.
When you talk to an LLM there are really two kinds of messages: the system prompt and the user message. The system prompt sits at the top of every conversation as a persistent instruction. The model sees it before producing every reply.
It typically contains: role definition ("You are a Turkish recipe assistant"), rules ("answer in 5 lines max"), format ("use bullet points"), constraints ("don't discuss politics"), and static context ("our company is Avva, founded 2018"). The user never sees it — it runs behind the curtain.
Modern models (Claude, GPT-4) give system prompts extra weight. The same instruction in a user message might get ignored; in the system prompt the model sticks to it.
A new waiter starting at your restaurant gets the menu, dress code, and "always say 'sir' not 'mate'" rules on day one. Those notes stay in their head all shift. The system prompt does the same — no matter how long the chat goes, the model keeps glancing back at "what's my role, what are my rules?"
You're building an e-commerce support bot. System prompt:
"You are Avva's customer support assistant. Reply only in Turkish.
Use the get_order(id) tool for order info. Never offer a discount —
escalate to a human for that. Keep replies under 4 sentences."
User asks "how many hours for shipping?" — model replies in Turkish,
short, using the tool. User asks "give me 50% off" — model says
"I'll transfer you to a representative." The system prompt nudges
every turn.
from anthropic import Anthropic
client = Anthropic()
SYSTEM = """You are Avva's customer support assistant.
Reply only in Turkish. Maximum 4 sentences.
Never promise a discount — escalate to a human."""
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=SYSTEM, # <-- persistent instruction
messages=[
{"role": "user", "content": "How long until my order ships?"}
],
)
print(msg.content[0].text)
# Model returns a Turkish, short, rule-following reply.- Giving the model a role/persona (assistant, tutor, doctor, etc.)
- Locking output format (JSON, markdown, length)
- Do/don't rules (no politics, no leaking secrets)
- Static context: company info, date format, language preference
- Info that changes per request — put it in the user message or context
- Very long document context — use RAG instead of stuffing the system prompt
- One-off prompt experiments — a plain user message is enough
Writing it 5000 tokens long
A 5000-token system prompt means 5000 fewer tokens for the user message. Lead with the critical rules, drop the rest.
Contradictory rules
'Never do math' + 'answer every question' will make the model break one. Keep the rules consistent.
Treating it as secret
Some models will leak the system prompt when asked to 'recite the previous message'. Don't put API keys or user secrets in it.