For network engineers, cloud architects, and the product leaders who support them.
The Problem
Cloud networks are opaque by design. You get a control plane — route tables, NSG rules, peering state, effective routes — and a data plane you can only observe indirectly, through packet captures that take minutes to provision and reports that take hours to read.
When something breaks, the failure is rarely where it appears. A connectivity outage looks like a firewall problem. The firewall is clean. It’s a routing problem. The route table looks correct. The effective route on the NIC is overriding it. The portal shows all green throughout.
The senior engineer who has seen this pattern before will know to look at the effective route table. The junior engineer will file a ticket. The team without that senior engineer available will spend two hours in the wrong layer.
This is the problem the Ghost Agent was built to solve: encoding the investigation methodology of a senior network forensics engineer into an autonomous CLI that requires your explicit approval before any risky action, maintains a full audit trail, and works directly against your cloud infrastructure.
How Teams Debug Today
Here is what a real cloud network investigation looks like today:
Engineer opens the cloud portal
→ Checks firewall / security group rules manually (UI, no audit record)
→ "Looks fine"
→ Checks route tables (maybe — if they think of it)
→ Misses the effective route divergence
→ Opens a ticket: "network issue, NSG checked, routing unknown"
→ Second engineer takes over cold, no context transfer
→ Hours later: someone with packet capture experience runs tcpdump
→ 500-line output, interpreted manually
→ Root cause: stale UDR pointing to a decommissioned NVA
→ Total time: 2-4 hours, 2 engineers, no durable audit trail
Three failures compound: investigation stops at the first clean layer; local probe evidence is weighted the same as cloud API evidence; and the escalation path exists in a senior engineer’s head and nowhere else.
What Ghost Agent Does
Ghost Agent is a conversational CLI. You describe a symptom in plain English. It forms hypotheses, runs diagnostics autonomously, stops and waits for your approval before anything mutative, captures wire traffic when needed, and produces a forensic RCA with a full command audit trail.
╔══════════════════════════════════════════════════════════════╗
║ GHOST AGENT (ghost_agent.py) ║
║ Startup Handler │ Tool-Use Loop │ RCA Report Generator ║
║ • Orphan detect │ • Gemini API │ • Reads audit JSONL ║
║ • Session resume │ • Dispatch │ • Writes RCA .md ║
╚═════╤═════════════╧═════════════════╧══════════════╤═════════╝
│ │ (read-only)
shell.execute() orchestrate() ./audit/
│ │
▼ ▼
╔══════════════╗ ╔══════════════════╗
║ Safety Shell ║ ║ Cloud ║
║ ║ ║ Orchestrator ║
║ 4-tier ║ ║ ║
║ classify ║ ║ Capture ║
║ Approval ║ ║ lifecycle ║
║ gate ║ ║ Blob download ║
║ Audit JSONL ║ ╚════════╤═════════╝
╚══════╤═══════╝ │ invokes
│ ╔════════╧════════╗
│ ║ PCAP Forensic ║
│ ║ Engine ║
│ ║ ║
│ ║ tshark extract ║
│ ║ Semantic JSON ║
│ ║ Gemini RCA ║
│ ╚═════════════════╝
│ (also standalone)
▼
az CLI, ping, dig, traceroute, ss
The escalation ladder is encoded, not assumed:
Level 1 — Local probes (auto-approved, runs on engineer's machine)
ping, dig, traceroute, ss, curl
│ inconclusive → escalate
▼
Level 2 — Cloud API reads (auto-approved, read-only verb)
az nsg rule list, az route-table list,
az nic show-effective-route-table, az vm show
│ packet-level evidence needed → escalate
▼
Level 3 — Network Watcher packet capture (approval required)
capture_traffic → burst-poll → download → PCAP forensic analysis
│ investigation complete
▼
Level 4 — RCA generation with full evidence citation
Every action is classified before it executes. Read-only cloud operations are
auto-approved. Anything mutative — update, create, delete — stops and waits
for you. You see the exact command, the reasoning behind it, and you approve, deny,
or modify it. Your decision is written to the audit trail.
A Production Investigation, Step by Step
The engineer’s prompt:
“We’re seeing intermittent TCP connection failures from tf-source-vm to tf-dest-vm on port 8080. Control plane looks clean — NSGs and routes have been checked. We need wire-level evidence.”
[Ghost Agent] Forming hypotheses:
H1: NSG deny rule on port 8080 — not yet ruled out despite reported check
H2: Routing anomaly causing intermittent path failures
H3: TCP-level issue — MTU mismatch, retransmission storm, or RST injection
[Shell] SAFE — auto-approved: az network nsg rule list --nsg-name tf-dest-vm-nsg ...
[Shell] SAFE — auto-approved: az network nic show-effective-route-table --resource-group nw-forensics-rg ...
[Ghost Agent] NSG is clean — all rules ALLOW on port 8080. H1 REFUTED.
Effective route table confirms VNetLocal path — no UDR overrides. H2 REFUTED.
Control plane is entirely clean. H3 requires wire-level evidence.
Escalating to packet capture.
┌─────────────────────────────────────────────────────────────────────┐
│ SAFETY SHELL ALERT │
│ TIER: 2 │ CLASSIFICATION: RISKY │
│ COMMAND: az network watcher packet-capture create --vm tf-sour… │
│ RISK: mutative operation — create verb (Tier 2) │
│ REASONING: H3 requires wire-level evidence. All control-plane … │
│ │
│ [A]pprove [D]eny [M]odify command │
└─────────────────────────────────────────────────────────────────────┘
Your choice: A
[Shell] APPROVED: az network watcher packet-capture create --vm tf-sour…
[Cloud Orchestrator] Capture created: ghost-tf-source-vm_20260222T091543
Polling status ... Succeeded
Downloading ghost-tf-source-vm_20260222T091543.cap from nwlogs080613 ...
Running PCAP forensic analysis ...
[Ghost Agent] Forensic report received. Key findings:
TCP retransmission rate: 38% (threshold: 5%)
ICMP type 3 code 4 (Fragmentation Needed / PMTUD failure): 47 packets
Oversized frames silently dropped — source not adjusting segment size
Root cause: tf-source-vm has jumbo frames enabled (MTU 9000).
The network path MTU is 1500. Every TCP session that produces segments
larger than 1500 bytes fails intermittently.
══════════════════════════════════════════════════════
RCA REPORT WRITTEN: ./audit/ghost_rca_ghost_20260222_091831.md
Confidence: high | Turns: 5
══════════════════════════════════════════════════════
Five turns. The Safety Shell blocked on the packet-capture create, waited for the
engineer’s approval, logged the decision, then continued. The RCA names the exact
failure — an MTU mismatch invisible to the control plane — with the specific ICMP
and retransmission evidence cited by audit_id. Every command, every approval
decision, and every hypothesis state transition is in the audit trail.
Use Cases, All Tested
Each scenario represents a distinct class of production failure, requiring a different investigation strategy. These four were selected to show the range of what the system handles — from pure control-plane analysis to wire-level forensics to multi-component relational failures.
| Use Case | What Breaks | What the Investigation Reveals |
|---|---|---|
| B — The Wire Doesn’t Lie | Intermittent TCP issue, control plane clean | Wire-level PCAP report: retransmission rate, DNS latency, ICMP unreachables — evidence that no control-plane query can produce |
| D — The Two-Headed Hydra | Two services fail after an NSG maintenance window | Two deny rules at different priorities — two engineers, two separate changes, attributed individually in the RCA |
| E — The Phantom Route | NSG clean, portal green, traffic vanishes | Stale UDR pointing to an NVA that was planned but never provisioned |
| F — The Silent Gatekeeper | Storage unreachable, all control plane clean | Service endpoint removed during routine subnet maintenance — invisible unless both sides of the relationship are checked in the same step |
Use Case F deserves a specific call-out. The storage account firewall correctly allows traffic from the subnet. The subnet itself has no error condition. Neither component shows a problem in isolation. The failure is only visible when you check both sides of the service endpoint relationship in the same diagnostic step — something that requires knowing to look for it. Ghost Agent finds it in under five minutes.
What It Takes to Run
Scope: Currently targets Azure. The Safety Shell and PCAP Forensic Engine are cloud-agnostic — extending the investigation layer to AWS, GCP, or OCI requires replacing the Azure CLI calls with the equivalent cloud CLI; the safety classification, hypothesis tracking, and forensic analysis pipeline are unchanged.
Prerequisites:
- Python 3.12+ with uv
- Azure CLI authenticated:
az login - A Gemini API key from aistudio.google.com — the free tier is sufficient for most investigations
NetworkWatcherAgentLinuxextension on target VMs (for packet capture use cases only)
Configuration: Copy demo/sample_config.env to demo/config.env and fill in
your Azure resource names and Gemini API key. All credentials are loaded from this
file at runtime — no environment variable setup required beyond that.
Cost per investigation: A typical control-plane investigation (Use Cases D, E, F) costs under $0.05 in Gemini API calls at default model settings. Packet capture runs add Azure Network Watcher costs (~$0.10/capture-hour) plus blob storage egress.
Start here: Use Case E if you want pure control-plane diagnosis with no storage account needed — it runs to a confirmed root cause in under five minutes. Use Case B if you have Azure Network Watcher and a storage account configured — Ghost Agent handles the full capture lifecycle automatically: creates the capture, polls for completion, downloads the file, runs forensic analysis, and generates the RCA report.
Four Independent Tools
Ghost Agent is the investigation layer. It is assembled from three reusable components that each stand alone — you can drop any of them into your own tooling without taking the rest.
🛡️ Agentic Safety Shell
The deterministic guardrail between any AI agent and your infrastructure. Every proposed command passes through a four-tier classification pipeline before executing:
Tier 0 — Forbidden list rm -rf /, mkfs, fork bombs → unconditionally blocked
Tier 1 — Always-safe list ping, dig, traceroute, az list/show → auto-approved
Tier 2 — Verb matching update, create, delete → requires approval
Tier 3 — Dangerous patterns sudo, &&, $(...) injection → requires approval
Default — Unknown input anything unrecognised → requires approval
The default tier is the critical design decision: anything unrecognised is treated as requiring approval, not as safe. Classification is deterministic — no LLM involved. Every tier is independently unit-testable with adversarial inputs.
Drop it between your agent and any shell execution path.
🔍 Agentic PCAP Forensic Engine
AI-powered packet analysis. Takes a .pcap or .cap file, runs tshark to extract
structured per-protocol metrics (TCP, DNS, ICMP, ARP), compresses the result to a
Semantic JSON summary at up to 95% data reduction, and runs Gemini forensic reasoning
over it. Produces a Markdown report with executive summary, ranked anomaly table, and
actionable remediation commands.
What it detects: TCP retransmission storms, PMTUD failures, ARP spoofing, DNS DGA patterns, NXDOMAIN spikes, ICMP unreachable correlation, latency percentile regressions.
Run it standalone: python pcap_forensics.py your-capture.pcap
☁️ Agentic Cloud Orchestrator
Azure Network Watcher packet capture lifecycle manager. Creates captures, burst-polls
provisioning status, downloads the .cap blob, invokes the PCAP Forensic Engine,
and cleans up Azure resources — all as a single audited task. Handles Azure’s platform
constraints directly: one active capture per VM, --location required for all
non-create operations, orphan detection and cleanup across sessions.
Wire it into any Python automation that needs to create, monitor, analyze, and clean up Azure Network Watcher captures as a single operation.
Technical Challenges That Made This Hard
1. The Cloud API Is Not One Interface
Azure’s CLI and REST API diverge on write operations. The CLI flattens what the REST API expects nested, and the difference is silent:
az subnet update --route-table ""
What you expect: route table association → null
What the CLI sends: /subscriptions/.../routeTables/ ← empty name, rejected silently
For “remove association” operations: az rest GET the resource in its raw nested
form, strip the field in one line of Python, az rest PUT it back. The CLI abstraction
misleads; step around it.
The same command group can require different mandatory parameters per subcommand.
az network watcher packet-capture create requires --resource-group. Every other
subcommand in that group — show, delete, list — requires --location instead.
The only way to discover this is to test every operation end-to-end, not just creation.
2. LLM State Machine Reliability
A transient rate-limit error on turn 1 can silently skip state initialization. The agent’s working hypothesis list is never created. On turn 2, the agent checks “are all hypotheses resolved?” — yes, the list is empty — and signals investigation complete with zero evidence collected.
Turn 1: [API rate limit — response dropped]
→ hypothesis list never initialized: []
Turn 2: "Are all hypotheses resolved?" → yes (empty list)
→ "Investigation complete" — no evidence gathered
The fix is a recovery invariant in the system prompt: “If you arrive at a turn where the hypothesis list is empty and the investigation has not concluded, re-initialize it before any other action.” This rule must be written into the specification — it cannot be assumed.
3. Deterministic Safety Over Probabilistic Safety
Asking the LLM to classify its own proposed commands is tempting — the model understands semantics. But it makes the safety gate dependent on the most unpredictable component in the system.
LLM proposes action
│
▼
┌─────────────────────────────────┐
│ Deterministic classifier │ ← allowlist, verb match, pattern rules
└─────────────────────────────────┘
│ │
▼ ▼
APPROVED DENIED
(action runs) (LLM notified;
never the decider)
The LLM reasons about what to do. Deterministic logic decides whether it is safe to do it. These are different questions that belong in different parts of the system.
4. Context Pollution in Agentic Prompts
Any resource name present in the investigation prompt becomes a candidate for the agent’s reasoning — including as the target of its own operational actions.
Prompt: "Storage account nwlogs080613 is unreachable..."
↑
Agent uses this for:
[1] investigation target ✓
[2] packet capture destination ✗
In one scenario, the investigation target storage account was the same account used to store packet captures. The agent, reading the prompt, routed its own capture outputs to the locked-down account and failed. The fix: separate the subject under investigation from the agent’s operational infrastructure at the naming level, the prompt level, and via CLI argument injection that the agent cannot confuse with user-provided context.
Conclusion
What Ghost Agent removes is the overhead of remembering which layer to check next and the risk of stopping too early. It does not replace the engineer — it requires your explicit approval before every risky action and produces a full audit trail of every decision. What it replaces is the manual, sequential, expertise-dependent process that produces a different outcome depending on who is on call.
The tool is open source, built with standard components (Python 3.12, Gemini via google-genai, Azure CLI), and deployable against any Azure subscription where you have read access to network resources. The field lessons that emerged from building it are published alongside the code.
GitHub: github.com/ranga-sampath/agentic-network-tools
Clone the repo. Point it at a resource group. Describe a symptom.
