You Checked the NSG. You Checked the Wrong One.

For network engineers, cloud architects, and the product leaders who support them.

The NIC NSG shows AllowVnetInBound. The connection times out. An engineer who checks only the NIC NSG will confirm the allow rule and still not find the block. It is in the subnet NSG — evaluated first for inbound traffic, invisible to any investigation that queries a single NSG in isolation.

The Problem

Azure NSG evaluation applies two sequential gates. For inbound traffic, the subnet NSG fires first. Then the NIC NSG fires second. A deny in the subnet NSG at priority 200 blocks every packet before the NIC NSG is reached, regardless of what the NIC NSG permits.

Two properties of how Azure surfaces NSG state make this failure pattern repeat:

The portal shows each NSG in isolation. There is no combined view. When an engineer opens the NIC NSG and sees AllowVnetInBound, that finding is accurate. The subnet NSG evaluation that precedes it for inbound traffic is in a separate panel. Nothing in the portal links them.

az network nsg rule list and the portal NSG view query configured state. The combined evaluation result — which gate fires first, which rule is decisive, what the actual policy enforced at the NIC is — only appears via az network nic list-effective-nsg. That command is not in many runbooks. Most investigations stop at the single-NSG view.

A second, quieter problem accumulates over time. NSGs are touched by multiple teams across a workload’s lifetime. After 18 months, no single team member can confidently describe every rule. Wildcard-source ALLOW rules may have been added during an incident and never cleaned up. Rules may exist that can never fire because a higher-priority rule above them covers the same traffic. No standard Azure tool surfaces a compliance view: unreachable rules, overly permissive custom ALLOW entries, gates relying entirely on defaults.

These two failure modes are separate. One is a P1 investigation scenario. The other is a security review scenario. This post covers both.

How Teams Handle It Today

These are two independent scenarios. They are not related to each other.

Scenario 1 — P1 connectivity investigation:

P1: tf-source-vm cannot reach PostgreSQL on tf-dest-vm. Connections time out.
  → Engineer opens portal → NIC NSG → finds AllowVnetInBound at priority 65000
      → "NSG is clean. Port 5432 is allowed."
  → Escalates: "This is a DB or application issue"
      → DB team checks: PostgreSQL is up, accepting connections locally
  → Investigation bounces between teams
      → The subnet NSG was associated with the default subnet earlier that day.
        A Deny rule for TCP:5432 at priority 200 was added there.
        For inbound traffic, Azure evaluates the subnet NSG first.
        The NIC NSG is never reached.
      → The NIC NSG AllowVnetInBound the engineer found was accurate.
        It was evaluated at Gate 2 — which the subnet deny at Gate 1 prevented
        from ever firing.

The investigation stops at the NIC NSG. Whether the subnet NSG gets checked depends on the team’s runbook and whether they have seen this failure pattern before. The effective combined evaluation — az network nic list-effective-nsg — is what would have found it.

Scenario 2 — Security compliance review (unrelated to Scenario 1):

Security team flags a VM for compliance review.
Production for 18 months. NSG touched by multiple teams.
  → Engineer exports NSG rules to review
      → Finds SSH allowed from any source
        Finds port 5001 open from any source
        "What is ghost-demo-temp-rdp-access? RDP from any source?"
  → "Was this intentional? Who added it?"
      → Nobody remembers. The rule name has "temp" in it.
        The VM is Linux — no RDP service is running.
        No monitoring alert ever fired. No connectivity test ever tripped over it.
  → The subnet has no NSG associated.
    All inbound traffic goes directly to the NIC NSG.
    No second layer of control.

The structural failure in both cases: the effective NSG state — the combined result of both gates, in evaluation order — is never queried directly. Engineers work from individual NSG views and infer the combined behavior. That inference is frequently incomplete.

What Security Rule Inspector Does

Security Rule Inspector queries az network nic list-effective-nsg — the single Azure CLI command that returns both the subnet NSG and the NIC NSG state at a given NIC — and applies the dual-gate evaluation model in pure Python.

Verdict mode answers a binary question for a specific traffic flow: was 10.0.1.4 → tf-dest-vm:5432 TCP inbound blocked? Which gate fired the decisive rule? What is the final ALLOW, DENY, or INDETERMINATE verdict?

Audit mode produces a full rule inventory and findings: shadowed rules, overly permissive custom ALLOW rules, and gates relying entirely on Azure defaults.

┌──────────────────────────────────────────────────────────────────────────┐
│  Ghost Agent  (network-ghost-agent/ghost_agent.py)                       │
│                                                                          │
│  _run_security_rule_inspector_handler()                                  │
│   generates: session_id → nsg_{session_id}                               │
│   invokes: subprocess (security_rule_inspector.py + args)                │
│   reads artifact: {AUDIT_DIR}/nsg_{session_id}_verdict.json              │
│               or  {AUDIT_DIR}/nsg_{session_id}_audit.json                │
└────────────────────────────────┬─────────────────────────────────────────┘
                                 │ subprocess
                                 ▼
┌──────────────────────────────────────────────────────────────────────────┐
│  security_rule_inspector.py  (CLI entry point + orchestrator)            │
│                                                                          │
│  [1] Validate     inputs; detect mode (verdict / audit);                 │
│         │         enforce nsg_ prefix; check session ID collision        │
│         │                                                                │
│  [2] Collect      providers.py → resolve primary NIC name,               │
│         │         then az network nic list-effective-nsg for that NIC;   │
│         │         write raw artifact to audit_dir                        │
│         │                                                                │
│  [3] Preprocess   nsg_preprocessor.py ← path to raw artifact             │
│         │         → normalised rule sets (subnet + NIC, both directions) │
│         │                                                                │
│  [4] Evaluate ────┬─────────────────────────────────────────────────     │
│                   │                                                      │
│           [verdict mode]                      [audit mode]               │
│           nsg_engine.evaluate_verdict()        nsg_engine.audit()        │
│           → gate verdicts, decisive rule,      → full rule inventory,    │
│             shadowed rules, final verdict        posture findings        │
│             write verdict artifact               write audit artifact    │
│                                                                          │
│  [5] Output       print human-readable table to stdout                   │
│                   (artifact already written; output derived from it)     │
│                                                                          │
│  ┌──────────────┐  ┌──────────────────────┐  ┌────────────────────────┐  │
│  │ providers.py │  │ nsg_preprocessor.py  │  │ nsg_engine.py          │  │
│  │ Azure CLI    │  │ Normalise rule sets  │  │ Dual-gate evaluation   │  │
│  │ boundary     │  │ Expand multi-value   │  │ Rule matching          │  │
│  │ (read-only)  │  │ fields               │  │ Shadow detection       │  │
│  └──────┬───────┘  └──────────────────────┘  │ Permissive detection   │  │
│         │ arg vector, no shell interpolation └────────────────────────|  │
└─────────┼────────────────────────────────────────────────────────────────┘
          ▼
┌─────────────────────────────────────────────────────────┐
│  Azure Control Plane (read-only)                        │
│  Query 1: VM primary NIC identity (az vm show)          │
│  Query 2: Effective NSG state at the NIC                │
│           (az network nic list-effective-nsg)           │
└─────────────────────────────────────────────────────────┘

Three design decisions are worth understanding:

The evaluation engine is a pure function. nsg_engine.py has no I/O and no side effects. The verdict is produced by a deterministic algorithm applied to structured inputs — not by AI inference. Ghost Agent Brain synthesises the verdict with routing findings to produce the RCA. The verdict itself is a fact.

Raw artifact written before evaluation. The exact bytes Azure returned are written to nsg_{session_id}_raw.json before the preprocessor or engine touches them. The evaluation can be re-run against the raw file without re-querying Azure. The audit trail captures what Azure said, not what the tool inferred.

Inbound dst_ip derived from the NIC when omitted. For inbound verdict mode, --dst-ip may be omitted. The tool calls az network nic show to derive the VM’s own NIC private IP — which is always the correct inbound destination for traffic evaluated at that NIC.

The Dual-Gate Model

The evaluation order depends on traffic direction. It is encoded in the engine, not inferred.

INBOUND TRAFFIC                           OUTBOUND TRAFFIC
────────────────                          ─────────────────
Gate 1 — Subnet NSG                       Gate 1 — NIC NSG
  ↓ if ALLOW                                ↓ if ALLOW
Gate 2 — NIC NSG                          Gate 2 — Subnet NSG
  ↓                                          ↓
Final verdict                             Final verdict

Gate 1 DENY → Gate 2 skipped             Gate 1 DENY → Gate 2 skipped
Gate 1 INDETERMINATE → Gate 2 skipped    Gate 1 INDETERMINATE → Gate 2 skipped

A lower priority number means higher precedence. A rule at priority 200 fires before a rule at priority 65000. When the subnet NSG contains a deny for TCP:5432 at priority 200, no matching inbound traffic reaches the NIC NSG — regardless of what the NIC NSG permits.

The final verdict follows a defined combination table:

Gate 1	Gate 2	Final verdict
DENY	— (not evaluated)	DENY
ALLOW	ALLOW	ALLOW
ALLOW	DENY	DENY
ALLOW	INDETERMINATE	INDETERMINATE
INDETERMINATE	— (not evaluated)	INDETERMINATE

Use Cases, Both Tested

Use Case	What breaks	What the investigation reveals
U — “The Hidden Gate”	tf-source-vm cannot reach PostgreSQL on tf-dest-vm on port 5432. Connections time out. Service team confirms the DB is up.	Subnet NSG inbound deny at priority 200 is Gate 1. NIC NSG never evaluated. Ghost Agent identifies the blocking rule, the gate, and the relevant change earlier that day.
V — “The Open Doorway”	Security team flags tf-dest-vm for compliance review. NSG has been touched by multiple teams over 18 months.	Audit surfaces three permissive NIC NSG inbound rules with wildcard source and destination, including a “temp” RDP rule at priority 500 open from the internet. No subnet NSG associated — the subnet gate is entirely absent.

Use Case U — “The Hidden Gate”

Ghost Agent confirms the routing path is correct (VNetLocal route present, no blackhole). It calls inspect_nsg in verdict mode with the traffic tuple from the P1 description:

python security_rule_inspector.py \
  --vm-name tf-dest-vm \
  --resource-group nw-forensics-rg \
  --src-ip 10.0.1.4 \
  --dst-port 5432 \
  --proto tcp \
  --direction inbound

From the actual verdict artifact nsg_20260415_075226_verdict.json:

{
  "session_id": "nsg_20260415_075226",
  "mode": "verdict",
  "vm_name": "tf-dest-vm",
  "nic_name": "tf-dest-vm94_z2",
  "traffic": {
    "src_ip": "10.0.1.4",
    "dst_port": 5432,
    "protocol": "Tcp",
    "direction": "Inbound"
  },
  "gate_order": ["subnet", "nic"],
  "gate1": {
    "gate": "subnet",
    "verdict": "DENY",
    "decisive_rule": {
      "name": "securityRules/ghost-demo-block-pgsql",
      "priority": 200,
      "access": "Deny",
      "protocol": "Tcp",
      "source_address": "10.0.1.0/24",
      "destination_ports": ["5432-5432"]
    },
    "evaluated": true
  },
  "gate2": {
    "gate": "nic",
    "verdict": null,
    "evaluated": false,
    "skip_reason": "PRIOR_GATE_DENY"
  },
  "final_verdict": "DENY"
}

From the actual Ghost Agent investigation report ghost_report_ghost_20260415_075210.md:

Root Cause
NSG rule 'securityRules/ghost-demo-block-pgsql' on the subnet is blocking TCP traffic
on destination port 5432 from source address 10.0.1.0/24 to tf-dest-vm.

Hypotheses
H1  An NSG rule is blocking traffic from tf-source-vm to tf-dest-vm on port 5432.  Confirmed
H2  A route table entry is preventing traffic from reaching tf-dest-vm.             Refuted
H3  An OS-level firewall on tf-dest-vm is blocking connections to port 5432.        Refuted
H4  The PostgreSQL service is not listening on the network interface.               Refuted

Recommended Actions
1. Remove the NSG rule 'securityRules/ghost-demo-block-pgsql' from the subnet to allow
   traffic to tf-dest-vm on port 5432.
2. Alternatively, modify the rule to allow traffic from tf-source-vm (10.0.1.4) specifically.

The NIC NSG that the engineer checked — the one that shows AllowVnetInBound — never participated in the evaluation. Gate 2 shows "evaluated": false. Ghost Agent confirmed routing was correct (H2 refuted), checked the OS firewall (H3 refuted), and confirmed the service was up (H4 refuted) — all before reaching the structured DENY verdict from the tool.

Use Case V — “The Open Doorway”

Ghost Agent recognises a security posture question and calls inspect_nsg in audit mode — no traffic tuple, just the VM and resource group.

python security_rule_inspector.py \
  --vm-name tf-dest-vm \
  --resource-group nw-forensics-rg

From the actual audit artifact nsg_20260415_080221_audit.json — permissive findings:

"permissive_rules": [
  {
    "rule": {
      "name": "securityRules/AllowPipeMeter5001",
      "priority": 190,
      "access": "Allow",
      "protocol": "Tcp",
      "source_address": "0.0.0.0/0, ::/0",
      "destination_ports": ["5001-5001"]
    },
    "gate": "nic",
    "direction": "Inbound",
    "wildcard_dimensions": ["source", "destination"]
  },
  {
    "rule": {
      "name": "securityRules/SSH",
      "priority": 300,
      "access": "Allow",
      "protocol": "Tcp",
      "source_address": "0.0.0.0/0, ::/0",
      "destination_ports": ["22-22"]
    },
    "gate": "nic",
    "direction": "Inbound",
    "wildcard_dimensions": ["source", "destination"]
  },
  {
    "rule": {
      "name": "securityRules/ghost-demo-temp-rdp-access",
      "priority": 500,
      "access": "Allow",
      "protocol": "Tcp",
      "source_address": "0.0.0.0/0, ::/0",
      "destination_ports": ["3389-3389"]
    },
    "gate": "nic",
    "direction": "Inbound",
    "wildcard_dimensions": ["source", "destination"]
  }
]

Default-only gates from the same artifact:

"default_only_gates": [
  { "gate": "subnet", "direction": "Inbound",  "nsg_absent": true },
  { "gate": "subnet", "direction": "Outbound", "nsg_absent": true },
  { "gate": "nic",    "direction": "Outbound", "nsg_absent": false }
]

From the actual Ghost Agent investigation report ghost_report_ghost_20260415_080159.md:

Root Cause
The effective NSG rules on tf-dest-vm are overly permissive. The VM's NIC NSG allows
inbound traffic from any source to ports 5001, 22, and 3389. The subnet has no NSG
associated, making it even more exposed. The audit ID is nsg_20260415_080221.

Recommended Actions
1. Remove the permissive inbound NSG rules allowing traffic from any source to ports
   5001, 22, and 3389. Replace them with rules that only allow traffic from known and
   trusted sources.
2. Create and associate an NSG at the subnet level to provide an additional layer of
   security.
3. After updating the NSG rules, re-run the NSG audit to verify the changes.

The ghost-demo-temp-rdp-access rule has “temp” in its name — it was added during a troubleshooting session and never removed. The VM is Linux. No RDP service is listening on port 3389. No connectivity test ever trips over it. No monitoring alert ever fires. It surfaces only in a full effective rule audit. The nsg_absent: true entries show no subnet NSG is associated — all inbound traffic goes directly to the NIC NSG, with no second gate.

What Made This Hard to Build Right

INDETERMINATE Is Not an Error — It Is a First-Class Verdict

NSG rules can reference Application Security Groups (ASGs) as source or destination addresses. ASG membership — the set of IPs that belong to the ASG — is not present in the effective NSG JSON. It requires a separate Azure query per ASG, which itself may fail for ASGs in peered VNets.

The naive approach is to skip rules that reference ASGs and continue evaluating lower-priority rules. That approach produces incorrect verdicts. If a rule at priority 300 references an ASG and the next rule with a definitive CIDR match is at priority 500, skipping the ASG rule and returning the verdict from priority 500 silently discards the possibility that the ASG rule matched first.

The engine returns INDETERMINATE and stops when it encounters an unresolvable rule. Evaluation does not continue to lower-priority rules. Gate 2 is not evaluated when Gate 1 is INDETERMINATE.

Gate 1 (Subnet NSG — evaluated first for inbound):
  Rule halted:    allow-app-servers-inbound (priority 300)
                  source references ASG "app-servers" — membership not available
                  in effective NSG JSON
  Decision:       INDETERMINATE
  Gate 2:         Not evaluated (prior gate INDETERMINATE)

Unresolvable:   allow-app-servers-inbound (priority 300)
                  Provide ASG member IP addresses to resolve.

This is fail-closed behavior. An INDETERMINATE verdict surfaces the unresolvable rule to the engineer and to Ghost Agent — who then knows to request the missing information rather than proceeding on a false premise.

The three standard Azure service tags (VirtualNetwork, Internet, AzureLoadBalancer) are handled by the preprocessor, which reads the expandedSourceAddressPrefix field containing their resolved CIDRs. Non-standard service tags (e.g., Storage.EastUS) are not present in the effective NSG JSON and are classified as INDETERMINATE on the same principle.

Where It Fits in Ghost Agent’s Investigation Sequence

Security Rule Inspector is the third layer in Ghost Agent’s Azure connectivity investigation hierarchy:

Ghost Agent — Azure Connectivity Investigation Sequence
══════════════════════════════════════════════════════════════════════

  Symptom described in plain English
        │
        ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  Route path confirmation                                         │
  │  effective_route_inspector                                       │
  │  → VNetLocal present? No blackhole? Correct next-hop?            │
  └──────┬───────────────────────────────────────────────────────────┘
         │ routing is correct
         ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  NSG evaluation             ← security_rule_inspector            │
  │  inspect_nsg                                                     │
  │  → Which gate fired? What rule is decisive? ALLOW or DENY?       │
  └──────┬───────────────────────────────────────────────────────────┘
         │ NSG is ALLOW
         ▼
  ┌─────────────────────────────────────────────────────────────────┐
  │  Host firewall check                                            │
  │  detect_config_drift → firewall_inspector.py                    │
  │  → iptables/nftables rules inside the guest OS                  │
  └─────────────────────────────────────────────────────────────────┘

Each layer is a precondition for the next. A routing blackhole drops the packet before the NSG is reached — querying NSG rules against traffic that is route-dropped produces a misleading ALLOW verdict. An NSG ALLOW does not guarantee the application receives traffic — iptables and nftables inside the VM operate independently of Azure’s network stack and can silently drop packets that Azure passed. In the use case U investigation above, Ghost Agent refuted H2 (routing) and H3 (OS firewall) explicitly — the investigation reached the NSG layer because the evidence warranted it.

Ghost Agent reads the structured JSON artifact at the deterministic path it constructed before the subprocess ran. It does not parse stdout. The handler returns mode, final verdict, gate results, decisive rule, shadowed rules, and session ID to the Brain, which synthesises an RCA.

What It Takes to Run

Prerequisites:

Python 3.12+
Azure CLI authenticated: az login
RBAC permissions:

Permission	Required for
`Microsoft.Compute/virtualMachines/read`	NIC name resolution via `az vm show`
`Microsoft.Network/networkInterfaces/effectiveNetworkSecurityGroups/action`	`az network nic list-effective-nsg`

Both are included in Network Contributor. The built-in Reader role does not include effectiveNetworkSecurityGroups/action — a silent gap that surfaces as an authorization error in the CLI output, not a clean 403.

Verdict mode:

# Full traffic tuple — src IP, dst port, protocol, direction
python security_rule_inspector.py \
  --vm-name tf-dest-vm \
  --resource-group nw-forensics-rg \
  --src-ip 10.0.1.4 \
  --dst-port 5432 \
  --proto tcp \
  --direction inbound

# --dst-ip may be omitted for inbound — derived from the VM's own NIC private IP

Audit mode:

python security_rule_inspector.py \
  --vm-name tf-dest-vm \
  --resource-group nw-forensics-rg

Ghost Agent integration: Set VM_NAME, RESOURCE_GROUP, and AUDIT_DIR in config.env. The inspect_nsg tool calls this as a subprocess, generates the session ID, constructs the artifact path deterministically, and reads the result into the Brain.

What This Tool Does Not Do

It does not diagnose routing failures. A packet dropped by a routing blackhole never reaches an NSG. Route investigation comes before NSG investigation.

It does not inspect the OS-level firewall inside the VM. An NSG ALLOW means Azure’s network stack passed the traffic to the VM’s NIC. iptables and nftables are separate — they are invisible to Azure’s control plane. If the NSG is clean and the connection still fails, the host firewall is the next investigation step.

It does not resolve Application Security Group membership. When a rule uses an ASG, the verdict is INDETERMINATE — it does not skip the rule and continue. The engineer must supply ASG membership to resolve it.

It does not produce AI-generated verdicts. The verdict is a deterministic output of the dual-gate evaluation algorithm. Ghost Agent Brain produces the narrative and remediation guidance from the structured facts the tool returns.

Conclusion

Security Rule Inspector makes one specific step in a connectivity investigation exact and repeatable: evaluating what Azure NSG policy is actually enforced at a VM’s NIC, for a specific traffic flow or across all flows. It removes the failure mode where investigations stop at the NIC NSG without reaching the subnet NSG that fired first. It removes the manual spreadsheet export for a compliance review by surfacing shadowed rules, permissive entries, and absent subnet gates in a structured, machine-readable result. The dual-gate model, the evaluation order, and the decisive rule are deterministic outputs.

For engineering leaders: the NSG investigation step that previously depended on whether the on-call engineer had seen this failure pattern before is now a two-command, sub-minute operation with a structured result that tells the agent what to do next.

GitHub: github.com/ranga-sampath/agentic-network-tools

Clone the repo. Run it against a VM with a P1 connectivity failure. Check whether the subnet NSG fired before the NIC NSG you were looking at.