The Route Table Checked Out. The Packet Never Arrived.

For network engineers, cloud architects, and the product leaders who support them.


Azure’s route table resource shows configured intent. The effective route table at the NIC shows what Azure’s routing subsystem will actually do with a packet. The two are not the same. When they diverge, traffic disappears silently: no ICMP unreachable, no error log, no alert.


The Problem

Four routing failures are common and share one property: none of them are visible in the Azure portal, and none generate an alert or log entry.

Routing blackhole. A route exists with next_hop_type = None. Azure silently drops every matching packet. The NSG is clean. The route table resource looks intact.

Phantom NVA. A UDR routes traffic to a VirtualAppliance that no longer exists. The route stays in the effective table, reflected as a blackhole. No packet reaches the destination or generates an error on either end.

LPM beats source tier. A User-defined /32 host route sits alongside a Default VNetLocal /16. Azure applies longest prefix match unconditionally: the /32 wins regardless of source tier. The VNetLocal route is present and healthy. Azure never uses it for that destination.

Invalid route shadowing. An Invalid route with a longer prefix exists for the destination. Traffic falls back to a less specific path. The investigation finds a valid route winning — but not the one that was intended.

The effective route table is computed state. It is not the route table resource attached to the subnet. Querying the resource tells you what was configured; querying the NIC tells you what Azure enforces.


How Engineers Analyze Azure Routes Today

P1: tf-source-vm cannot reach tf-dest-vm. Silence. No error, no RST.
  → On-call: portal → route table resource → no obvious issue
      → "Route table looks fine."
  → On-call: az network nsg rule list on both NICs → no deny rules
      → "NSG is clean."
  → Ticket opened: "Possible application issue?"
      → App team confirms: service is up, listening locally.
  → Two teams, two hours. Eventually someone runs:
      az network nic show-effective-route-table --name tf-source-vm-nic
      → Finds a /32 User route pointing to None on the source NIC
      → That route has been there since a "hardening exercise" six hours ago
      → The VNetLocal route for the subnet was present. Azure never used it
          for 10.0.1.5 because the /32 took precedence via LPM.

The structural failure: engineers query configured state and stop. Applying the route selection algorithm — longest prefix match → source precedence → BGP tie-break — against a real effective route table isn’t intuitive and isn’t part of most runbooks.


What Effective Route Inspector Does

Effective Route Inspector queries az network nic show-effective-route-table and applies the Azure route selection algorithm in deterministic Python. No AI. It returns a structured verdict: which route wins for a specific destination, why it won, and whether any anomaly is present.

Route Selection Algorithm — strict order
══════════════════════════════════════════════════════════════

  Effective route table (from NIC, not subnet route table resource)
         │
  Step 1: CIDR containment
         Active routes only. dst_ip must fall within prefix.
         None match → NO_ROUTE. Stop.
         │
         ▼
  Step 2: Longest Prefix Match
         Highest prefix_length wins. Unconditional.
         /32 Default beats /24 User. /24 User beats /16 Default.
         One route remains → winner. Go to anomaly checks.
         │
         ▼  (multiple routes tied on prefix_length)
  Step 3: Source precedence
         User (1) > VirtualNetworkGateway (2) > Default (3)
         One route remains → winner. Go to anomaly checks.
         │
         ▼  (two or more VirtualNetworkGateway routes remain)
  Step 4: BGP tie-break
         AS Path length is not in the effective route table JSON.
         → TIED_BGP. Stop. Do not guess.

  Anomaly checks on the winner:
    next_hop_type == "None"             → BLACKHOLE_WARNING
    next_hop_type == "VirtualAppliance" → NVA_WARNING
    Invalid route with longer prefix
    covers dst_ip                       → INVALID_SHADOW_WARNING

Two artifacts written per run:

  • rt_{session_id}_raw.json — verbatim az CLI output
  • rt_{session_id}_verdict.json — structured verdict (read by Ghost Agent Brain)

Audit mode (no --dst-ip): all routes sorted by prefix length descending, invalid routes listed separately, blackhole and NVA routes flagged across the full table.

┌───────────────────────────────────────────────────────────────────────────┐
│  Ghost Agent  (network-ghost-agent/ghost_agent.py)                        │
│                                                                           │
│  _run_effective_route_inspector_handler()                                 │
│   generates: session_id → rt_{timestamp}                                  │
│   invokes: subprocess (effective_route_inspector.py + args)               │
│   reads: {AUDIT_DIR}/rt_{session_id}_verdict.json                         │
│   returns to Brain: winning_route, selection_reason,                      │
│                     anomaly_warnings, shadowed_candidates, session_id     │
└──────────────────────────┬────────────────────────────────────────────────┘
                           │ subprocess
                           ▼
┌───────────────────────────────────────────────────────────────────────────┐
│  effective_route_inspector.py  (orchestrator)                             │
│                                                                           │
│  [1] Validate → [2] Collect → [3] Preprocess → [4] Analyze → [5] Output   │
│                    write raw          normalise      LPM engine   stdout  │
│                    artifact           route list     write verdict table  │
│                                                                           │
│  providers.py        route_preprocessor.py       lpm_engine.py            │
│  (Azure CLI only)    (normalise, expand)          (pure function, no I/O) │
│  arg vector,                                                              │
│  no shell=True                                                            │
└───────────────────────────────────────────────────────────────────────────┘
          │
          ▼ (read-only)
   az vm show  +  az network nic show-effective-route-table

Use Case S — “The Accidental Blackhole”

Setup: A network team attaches a new route table to the VNet subnet during a hardening exercise. The engineer means to route specific traffic through an NVA. The /32 host route for tf-dest-vm gets next-hop-type None by mistake. Every packet from tf-source-vm to 10.0.1.5 hits a User-defined blackhole. Azure silently drops it.

What the initial investigation shows: NSG on both VM NICs — clean. No deny rules. Standard NSG troubleshooting confirms the network policy is correct. This is accurate. The routing layer is the problem, not the NSG.

Ghost Agent’s prompt:

tf-source-vm cannot reach tf-dest-vm. Connections just hang — no error, no RST, just silence. Checked the NSGs on both VMs and nothing looks wrong. The networking team did some route table work on the subnet this morning as part of a hardening exercise. Timing lines up but I’m not sure if that’s the cause. Resource group: nw-forensics-rg.

Ghost Agent calls effective_route_inspector with vm_name=tf-source-vm, dst_ip=10.0.1.5.

Actual verdict artifact — rt_20260414_092951_verdict.json:

{
  "mode": "single-target",
  "dst_ip": "10.0.1.5",
  "result": "WINNER",
  "winning_route": {
    "prefix": "10.0.1.5/32",
    "prefix_length": 32,
    "next_hop_type": "None",
    "next_hop_ip": null,
    "source": "User",
    "state": "Active",
    "route_name": "blackhole-dest-vm"
  },
  "selection_reason": "LPM_ONLY",
  "shadowed_candidates": [
    {
      "prefix": "10.0.0.0/16",
      "prefix_length": 16,
      "next_hop_type": "VnetLocal",
      "source": "Default",
      "state": "Active"
    }
  ],
  "anomaly_warnings": [
    "BLACKHOLE_WARNING: winning route 10.0.1.5/32 has next_hop_type 'None' — Azure will silently drop traffic"
  ],
  "session_id": "rt_20260414_092951",
  "vm_name": "tf-source-vm",
  "nic_name": "tf-source-vm818_z2"
}

The verdict is complete: route name blackhole-dest-vm, source tier User, prefix /32, next_hop_type None. The shadowed candidates confirm the route that would have delivered the packet — Default VNetLocal /16 — and exactly why it lost: selection_reason: LPM_ONLY. Longest prefix match selected the /32 before source precedence was even evaluated. The VNetLocal route was healthy and never used.

Actual Ghost Agent investigation report — ghost_20260414_092920:

Root Cause
Traffic from tf-source-vm to tf-dest-vm (10.0.1.5) is being blackholed due to a
User route with next_hop_type 'None'. The effective route inspector
(audit_id=rt_20260414_092951) confirms that the route table change is the cause.

Hypotheses
H1  Route table change causing traffic to be blackholed.   Confirmed
H2  NSG rule blocking traffic.                             Refuted
H3  Firewall rule on tf-dest-vm blocking traffic.          Refuted

Recommended Actions
1. Remove the blackhole route for 10.0.1.5/32 from the route table.
   Verify connectivity after removing the route.

Investigation closes at L3. No packet capture. No multi-team escalation. The route table name, route name, source tier, and fix are in the verdict artifact.


Where Effective Route Inspector Fits

Ghost Agent calls this tool first when the symptom is routing-layer — silent drops, wrong path, suspected NVA bypass. NSG evaluation and routing are independent operations. A routing blackhole silently drops traffic; the NSG still evaluates outbound rules and returns an accurate ALLOW. That ALLOW is correct and completely misleading. L3 must be confirmed clean before the investigation proceeds to L4.

Ghost Agent — Azure Connectivity Investigation Sequence
══════════════════════════════════════════════════════════════════════

  Symptom described in plain English
        │
        ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  L3: Route verdict           ← effective_route_inspector         │
  │  Which route wins? Blackhole? Phantom NVA? Invalid shadow?       │
  │  → WINNER | NO_ROUTE | TIED_BGP                                  │
  └──────┬───────────────────────────────────────────────────────────┘
         │ routing clean
         ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  L4: NSG evaluation          ← security_rule_inspector           │
  │  Which gate fired? Subnet NSG or NIC NSG? ALLOW or DENY?         │
  └──────┬───────────────────────────────────────────────────────────┘
         │ NSG ALLOW
         ▼
  ┌─────────────────────────────────────────────────────────────────┐
  │  Host firewall               ← detect_config_drift              │
  │  iptables / nftables inside the VM                              │
  └─────────────────────────────────────────────────────────────────┘

Relationship to detect_effective_network_drift. The drift tool answers: did routing change between two points in time? The route inspector tells you: which route wins right now for this destination, and why. Both are needed. Drift tells you something changed. The inspector tells you what the current state means for a specific packet.


What Made This Hard to Build Right

1. The effective route table output is not a clean flat list

az network nic show-effective-route-table returns routes in several envelope formats depending on API version and region (value wrapper, effectiveRoutes key, raw list). Multi-prefix route entries exist — one route object covering several CIDR prefixes. States are inconsistently cased. Invalid routes appear in the same list as Active routes.

The preprocessor normalises all of this to a flat list of route dicts — one row per prefix, validated CIDRs, consistent state values — before the algorithm sees it. If normalisation and selection aren’t separated, the algorithm can’t be tested in isolation and becomes untestable.

2. BGP tie-break cannot be resolved at this layer

When two VirtualNetworkGateway routes share the longest prefix, they remain tied after source precedence. Azure’s actual tiebreaker is BGP AS Path length — shorter AS Path wins. But AS Path attributes are not present in the effective route table JSON. Resolving the tie requires a separate query to the gateway’s BGP peer table.

The naive response is to pick one arbitrarily. The correct response is TIED_BGP: stop, surface the tie, and tell the investigation where to look next. A tool that fabricates a winner to appear decisive is producing a confident wrong answer. That’s the failure mode this tool was built to prevent.

3. Configured type and effective type can differ

A route configured as VirtualAppliance can appear as next_hop_type: None in the effective route table. The configured state and the effective state diverge silently — no error is generated anywhere. The route table resource shows the configured intent; the NIC shows what Azure enforces.

The tool queries the NIC. That single design decision is the reason it finds what the portal doesn’t.


What It Takes to Run

Prerequisites:

  • Python 3.12+, uv package runner: curl -LsSf https://astral.sh/uv/install.sh | sh
  • Azure CLI authenticated: az login
  • RBAC: Microsoft.Network/networkInterfaces/effectiveRouteTable/action — included in Network Contributor; not included in the built-in Reader role

Standalone CLI:

# Single-target: which route wins for a specific destination?
python effective_route_inspector.py \
  --vm-name     tf-source-vm \
  --resource-group nw-forensics-rg \
  --dst-ip      10.0.1.5

# Audit: full route table — surface all blackholes and NVA routes
python effective_route_inspector.py \
  --vm-name     tf-source-vm \
  --resource-group nw-forensics-rg

# Skip NIC resolution if you already have the NIC name
python effective_route_inspector.py \
  --vm-name     tf-source-vm \
  --resource-group nw-forensics-rg \
  --dst-ip      10.0.1.5 \
  --nic-name    tf-source-vm-nic

Ghost Agent integration: Set VM_NAME, RESOURCE_GROUP, and AUDIT_DIR in config.env. The effective_route_inspector tool calls this as a subprocess, constructs the artifact path deterministically from the session ID, and returns the structured verdict to the Brain. Triggering prompts: “cannot reach destination VM — no error, just silence”, “internet access timing out after route table change”, “traffic is going to the wrong NVA.”


Scope Limits

A WINNER verdict with no anomaly confirms L3 routing only. It does not mean traffic reaches the application.

  • NSG rules are not evaluated here. Use security_rule_inspector.
  • OS-level firewall (iptables, nftables) is not inspected here. Use detect_config_drift.
  • Routing drift (did something change?) is not answered here. Use detect_effective_network_drift.
  • BGP AS Path tie — when two VirtualNetworkGateway routes share the longest prefix, the verdict is TIED_BGP. AS Path is not in the effective route table JSON.

Conclusion

Effective Route Inspector makes one step in a connectivity investigation deterministic: applying the Azure route selection algorithm to the NIC’s effective route table and returning a structured verdict. It surfaces the blackhole, the phantom NVA, and the LPM win that explains why a healthy VNetLocal route was never used, findings invisible to standard Azure diagnostic queries. The verdict is machine-readable and auditable. Ghost Agent Brain reads it and produces the RCA.

For engineering leaders: the routing investigation step that previously depended on which engineer was on call and whether they knew to query the NIC effective route table is now a structured, repeatable operation with an artifact that names the winning route, its source tier, and the anomaly.


GitHub: github.com/ranga-sampath/agentic-network-tools

Clone the repo. Pick a VM with a suspected routing issue and run:

python effective_route_inspector.py \
  --vm-name YOUR_VM --resource-group YOUR_RG --dst-ip YOUR_DESTINATION

The verdict tells you which route wins, which routes it beat, and whether what won should concern you.