For AI builders, cloud architects, and the engineering leaders who need to know what they’re encoding when they write a system prompt.
In 2013, a research paper I co-authored was accepted at an international conference. Three of us were sponsored and approved to attend. One of us had their visa denied. It happens. The fourth author on the paper, who hadn’t been sponsored yet, already had a valid visa and was ready to travel. We asked the program coordinator to swap the two: both were part of the same paper, no bookings had been made yet, and the sponsored budget was already in place for three presenters. No logistical consequence to anyone.
The coordinator refused. The confirmation deadline had passed. Substitutions were not permitted after the deadline, she said.
Two of us attended a conference that had been funded for three.
I wrote about this incident in 2020. The point I was making then was about humans, about the risk of abandoning judgment in favour of rule-execution. The coordinator wasn’t thinking. She was executing. The rule had become a substitute for reasoning. Two months of lead time, a ready substitute, a clearly solvable problem. The rule closed the door anyway.
I vented the frustration and moved on.
A decade later, building an AI agent to diagnose cloud network faults, I met her again. This time, I had built her.
The Inversion I Didn’t Anticipate
When I wrote that 2020 piece, the fear ran one way: humans becoming rule-followers. The machine was a metaphor for what we risk losing.
What I didn’t anticipate was arriving at the same problem from the other direction. In 2026, I’m not worried about humans acting like machines. I’m trying to make a machine act like the best version of a human: the engineer who reads intent, not procedure, who knows when the rule serves the purpose and when it doesn’t.
It’s harder than it sounds. It took me five attempts, and each time I thought I was making progress, I wasn’t yet seeing what I was actually building.
Five Attempts to Build a Detective
The agent (I call it Network Ghost Agent) diagnoses faults in cloud networks: dropped traffic, misconfigured routing, firewall rules that exist on paper but fail in practice. The failures that are invisible until something breaks in production. I worked through five phases progressively building the agent, making it incrementally better in each iteration.
SYSTEM PROMPT EVOLUTION — Ghost Agent
────────────────────────────────────────────────────────────────────────────
Phase 1 │ Escalation Model │ Structured investigation ladder:
│ │ local diagnostics →
│ │ cloud control-plane reads →
│ │ packet capture
│ │ Human approval gates for risky actions.
│ │ Worked for the scenarios it was built for.
│ │ Brittle beyond them.
────────┼───────────────────────┼────────────────────────────────────────────
Phase 2 │ Prescriptive Commands │ Specific tool commands encoded directly.
│ │ e.g., "run: az vm run-command invoke
│ │ tc qdisc show on BOTH source and
│ │ dest VMs."
│ │ Right when the situation matched exactly.
│ │ Failed on adjacent problems.
────────┼───────────────────────┼────────────────────────────────────────────
Phase 3 │ Principles │ Commands replaced with reasoning guidance.
│ │ e.g., "throughput degradation triggers
│ │ investigation of OS-level traffic shaping
│ │ or filtering on source and/or destination
│ │ endpoints."
│ │ Better generalisation. Verification still
│ │ self-applied, not structurally enforced.
────────┼───────────────────────┼────────────────────────────────────────────
Phase 4 │ Linguistic Gates │ Hedging language to catch uncertain answers.
│ │ e.g., "Before stating your diagnosis,
│ │ assess your confidence. If you are not
│ │ certain, say so explicitly and list what
│ │ evidence is missing."
│ │ Failed: the agent was confident AND wrong.
│ │ A confident wrong answer sails through a
│ │ linguistic hedge untouched.
────────┼───────────────────────┼────────────────────────────────────────────
Phase 5 │ Structural Gate │ Before any conclusion is accepted:
│ │ 1. Name each symptom exactly as observed
│ │ 2. Identify the proposed mechanism
│ │ 3. Ask: "Would THIS mechanism ALONE
│ │ produce THIS symptom?"
│ │ 4. Cite direct evidence the mechanism
│ │ is active
│ │
│ │ Verification is now a proof requirement,
│ │ not a principle the agent self-applies.
│ │ Confident wrong answers are structurally
│ │ blocked.
Procedure or Intent: That Is the Real Choice
The shift from Phase 2 to Phase 3 looks like a technical improvement. It was that, but it was also something bigger.
The prescriptive command said: run this tool, on these targets, right now. It encoded procedure, what to do in the specific situation I had already seen. When a new situation arrived that looked similar but wasn’t identical, the procedure failed. Not because the command was wrong. Because procedures are written for the situation you anticipated, not the one you’re in.
The principled version encoded intent instead: the reason you would run that tool in the first place, the diagnostic logic behind the step. When a new situation arrived, the principle still held. The agent could navigate because it understood why, not just what.
This is the same choice the program coordinator faced in 2013. She had a procedure: no substitutions after the confirmation deadline. It was correct for the situation it was written for, preventing last-minute chaos when logistics are already locked in. Applied two months out, with no bookings made and a ready substitute already on the paper, it violated everything the rule was meant to protect. She chose procedure. A funded slot went unused.
Every AI builder likely faces this choice, in any domain. Do you want the agent to follow your procedure, to do what you did in the situations you already anticipated? Or do you want it to achieve your intent, to reason toward the outcome you actually need, including the situations you haven’t seen yet?
Procedure gives you coverage for what you’ve already seen. Intent gives you something to reason from when you haven’t.
When the Agent Was Confident and Wrong
Encoding principles was a meaningful step forward, but it wasn’t enough.
A test scenario with two independent faults active simultaneously. Two distinct problems on the same network path, each producing its own symptom. The agent identified the first fault, confirmed its mechanism, attributed the symptoms it could explain to that mechanism, and closed the investigation. The second fault was never attributed. The conclusion was confident, well-cited, and incomplete.
The Phase 4 fix was linguistic: add hedging instructions so the agent would flag uncertainty before concluding. It didn’t work. The agent wasn’t uncertain. It had found a real fault. The hedge never fired.
That’s the gap linguistic gates can’t close. Confident wrong answers and uncertain wrong answers are not the same problem. A linguistic gate catches the second; it prompts the agent to pause when it doesn’t know. It does nothing when the agent already believes it does know. The instruction “if you are not certain, say so” is invisible to an agent that is certain.
The structural gate (Phase 5) changed what counts as a valid conclusion. Each proposed mechanism has to pass an isolation test: would this cause, by itself, independently produce exactly this symptom? If it can’t, the hypothesis fails regardless of how confident the agent sounds. It has to keep investigating; it can’t close the case while symptoms remain unattributed.
Under this gate, the first fault passes for the symptoms it explains. It fails for the symptom it can’t explain. The agent finds the second fault. Both are attributed. Case closed correctly.
The full technical detail (the specific network scenario, the exact system prompt before and after each phase, and the reasoning behind the checklist design) is in the System Prompt Journey.
What You Are Actually Encoding
I’ve come to think every system prompt is a theory of what rules are for.
If you encode outcomes, you get something that looks up answers. It’s right when the situation matches a covered case and confidently wrong when it doesn’t. The confidence isn’t earned; it’s a side effect of never having been required to verify.
If you encode the structure of valid reasoning instead, you get something different. Not a system that knows the right answer, but one that knows how to find it, and knows when to stop.
A few things follow from this, that I think hold in any domain.
A system prompt is a specification, not a settings file. A settings file configures known behaviour: you’ve already defined what the system should do and you’re adjusting parameters. A specification defines what correct looks like, what constitutes a valid investigation, a valid conclusion, a valid diagnosis, and that standard applies to cases you’ve never seen, not just the ones you have. If your system prompt reads as a list of if-then rules, it’s closer to a settings file. If it defines the standard the reasoning must meet, it’s a specification.
Writing rules for your test cases is not the same as encoding your method. Rules that fix the specific scenarios you tested will pass those tests and fail the adjacent ones you didn’t think of. Encoding the reasoning method, how an expert actually works through a problem, gives the agent something it can apply to cases it’s never encountered. It’s the difference between giving someone last year’s exam answers and teaching them the subject.
Structural failures need structural fixes. Linguistic gates catch uncertain wrong answers; they can’t catch confident ones. If the architecture lets the agent conclude before verifying, no amount of hedging language will stop it. The fix has to be at the level of what the architecture accepts as a valid conclusion, not at the level of how the agent expresses its answer.
The coordinator wasn’t uncertain. She’d checked the rule, it was clear, she’d done exactly what the system required of her.
The Test, and the Limit
Before you ship the next version of your system prompt, try one thing.
Take any conclusion your agent produces. Ask whether it could have arrived there through the wrong mechanism, one that fits the surface pattern but fails the underlying logic, and whether your system prompt would have caught it. If the answer is no, you’ve built a lookup engine. It will be right at the cases you anticipated and confidently wrong at the ones you didn’t.
But there’s something these five phases didn’t resolve, and I want to be direct about it. It would be hard for an agent to approximate the judgment of a human, even a well-built one.
The structural gate makes the agent a more rigorous reasoner within the scope of what it can observe. It doesn’t give the agent judgment about what the observation means in context. It doesn’t carry operational history, risk context, the pattern across years of similar incidents that a senior engineer holds in their head. It doesn’t know what it doesn’t know, and unlike a person who can usually sense the edges of their own knowledge, an agent will typically produce an answer whether or not the evidence truly supports one. Good system prompt design manages that tendency. It doesn’t eliminate it.
The 2013 incident was about whether a human would exercise judgment when a rule conflicted with intent. What I was doing in 2026 was asking the same question of a machine, for bounded, observable problems. The answer is yes, within a boundary.
The harder question is where that boundary sits, and who holds the judgment that lives on the other side of it.
