arrow_backAll articles
 How to Write a System Prompt That Actually Works
Tutorials

How to Write a System Prompt That Actually Works

Javier Echeverria··4 min read

If you've ever built anything on top of an AI API, or even just set up a custom AI assistant for personal use, you've had to write a system prompt. And if you've spent any time doing that, you've probably discovered that writing a good one is harder than it looks.

A system prompt that works well makes the model's behavior predictable, consistent, and appropriate for your use case. A system prompt that doesn't work well produces a model that ignores parts of your instructions, behaves inconsistently, or adds things you didn't ask for and omits things you need. The difference between those two outcomes is almost entirely in how the system prompt is written.

What a system prompt actually does

A system prompt is a set of instructions you give the model before any conversation starts. It tells the model who it is, what it's supposed to do, how it should behave, and what it should avoid. Everything the model does in that context is filtered through those instructions.

The model treats the system prompt differently from user messages. It carries more weight, it applies throughout the entire conversation, and it sets the frame within which everything else happens. A well-written system prompt means you don't have to repeat your instructions with every message, because the model already knows what's expected of it.

For production applications, the system prompt is often the most important piece of engineering in the whole setup. It determines whether the model behaves reliably across thousands of different user inputs or whether it goes off the rails in edge cases you didn't anticipate.

The most common mistakes people make

The first and most common mistake is being too vague. Instructions like "be helpful and professional" don't give the model much to work with because every AI model already thinks it's being helpful and professional. Vague instructions get ignored not because the model is bad at following them but because there's nothing specific enough to follow.

The second mistake is trying to cover every possible edge case with explicit instructions. System prompts that try to anticipate and address every situation the model might encounter end up being extremely long, internally inconsistent, and paradoxically harder for the model to follow because there's too much to keep track of. A shorter, clearer system prompt that establishes strong principles tends to outperform a long list of specific rules.

The third mistake is writing the system prompt in a way that sounds like it's explaining the tool to a human user rather than instructing a model. The model doesn't need background context or motivation. It needs clear, direct instructions about what to do and how to do it.

According to Wired's coverage of enterprise AI deployments, the quality of system prompts is one of the most consistent differentiators between AI implementations that work well in production and those that require constant manual intervention and correction.

What a good system prompt includes

A good system prompt answers a few core questions clearly and concisely. What is the model's role? What should it do? What should it not do? What format should responses be in? What tone should it use?

The role definition is worth spending time on because it sets the context for everything else. "You are a customer support assistant for a software company" gives the model a frame of reference that influences how it handles ambiguous situations, what level of technical language it uses, and what kinds of requests fall inside or outside its scope.

The dos and don'ts should be specific rather than general. Instead of "don't be rude," which is too vague to be useful, something like "if a user becomes hostile, respond once with a calm acknowledgment and offer to continue helping, then keep all subsequent responses brief and neutral" gives the model actual behavior to follow.

Format instructions are underused in most system prompts. If you want the model to always respond in a certain structure, always use a certain length range, or always include certain elements like a summary or a next step, put that in the system prompt explicitly. The model will follow format instructions consistently when they're clear.

Keeping it short without losing coverage

The tension in writing a system prompt is between completeness and conciseness. You want to cover enough to make the model's behavior predictable, but not so much that the prompt becomes unwieldy.

The approach that works best is to focus on principles rather than rules. Instead of writing a rule for every situation, write clear principles that the model can apply to situations you didn't anticipate. A principle like "when in doubt, ask a clarifying question rather than making an assumption" covers thousands of edge cases that no list of rules could fully address.

Keeping your system prompt under a thousand tokens is a good target for most applications. Above that, you start paying a meaningful cost per request just for the prompt itself, and the complexity often starts to work against you. The Token Counter on Prompt Toolbox makes it easy to check where you are as you're writing, so you can catch bloat before it becomes expensive.

Testing your system prompt before it goes live

The only way to know if a system prompt works is to test it with a wide range of inputs, including the edge cases and unusual requests you hope users won't send but know some of them will. A system prompt that handles your happy path perfectly and falls apart on unusual inputs is not a working system prompt.

Test with inputs that are ambiguous, inputs that are off-topic, inputs that try to get the model to ignore its instructions, and inputs that are just weird in ways you didn't anticipate. The goal is to find the gaps in your prompt before your users do, and then decide whether to address those gaps with additional instructions or accept them as acceptable behavior at the edges of your use case.

Try the tools