The Agentic PCAP Forensic Engine

AI-Powered Network Analysis from Raw Packet Captures

For network engineers, cloud architects, and the product leaders who support them.


The Problem

A packet capture is the most honest evidence available in a network investigation. It is also the hardest to read.

Raw .pcap files do not have opinions. They record every frame — retransmissions, DNS timeouts, ARP replies, RST teardowns — without context, without prioritization, and without telling you which of the 134 packets (or 134,000) actually explain why the application is failing.

The engineer who can open a PCAP in Wireshark, apply the right filters, identify the retransmission pattern, correlate it with the ICMP “Host Unreachable” from the upstream router, and connect that to the DNS SERVFAIL for the same destination — that engineer exists. There are not many of them. And when the on-call rotation lands on someone else, the PCAP sits there, unread, while the investigation stalls at the control plane.

The Agentic PCAP Forensic Engine was built to remove that dependency.


How Engineers Analyze PCAPs Today

Engineer receives .pcap file
  → Opens Wireshark or runs tshark manually
      → Applies display filters one at a time: tcp.analysis.retransmission
          → Counts retransmissions per stream manually
          → Does not correlate with ICMP Unreachable from upstream router
      → Opens Statistics > DNS tab
          → Sees NXDOMAIN count — does not check if the subdomain pattern is DGA
      → Exports TCP streams to CSV, opens in spreadsheet
          → Calculates RTT percentiles manually — tedious, error-prone
  → Documents findings in a text file
      → No structured severities, no remediation commands, no frame citations
  → Result: protocol-by-protocol notes that miss cross-layer root cause

Each protocol is analyzed in isolation, so cross-layer correlations are missed. The depth of analysis is bounded by the analyst’s recall of RFC failure modes under time pressure.


What the PCAP Forensic Engine Does

It is a four-stage pipeline that transforms a raw packet capture into a ranked, actionable forensic report.

  ┌──────────────────────────────────────────────────────────────┐
  │  Input: capture.pcap  or  capture.cap                        │
  └───────────────────────────┬──────────────────────────────────┘
                              │
                              ▼
  ┌──────────────────────────────────────────────────────────────┐
  │  Stage 1 — Protocol Extraction  (tshark)                     │
  │                                                              │
  │  ARP · ICMP · TCP · DNS — each in a dedicated tshark pass    │
  │  Per-packet fields: flags, RTT, stream IDs, RCODE, MACs      │
  └───────────────────────────┬──────────────────────────────────┘
                              │
                              ▼
  ┌──────────────────────────────────────────────────────────────┐
  │  Stage 2 — Semantic Reduction                                │
  │                                                              │
  │  Packet rows → protocol statistics + structured anomaly flags│
  │  Percentile aggregation: min / median / p95 / max            │
  │  Up to 95% data reduction — diagnostic signal preserved      │
  │  Output: Semantic JSON (~10–50 KB)                           │
  └───────────────────────────┬──────────────────────────────────┘
                              │
                              ▼
  ┌──────────────────────────────────────────────────────────────┐
  │  Stage 3 — AI Forensic Reasoning  (Gemini 2.0 Flash)         │
  │                                                              │
  │  Cross-protocol correlation                                  │
  │  RFC-grounded anomaly interpretation                         │
  │  Severity ranking: CRITICAL > HIGH > MEDIUM > LOW > INFO     │
  └───────────────────────────┬──────────────────────────────────┘
                              │
                              ▼
  ┌──────────────────────────────────────────────────────────────┐
  │  Stage 4 — Forensic Report  (Markdown)                       │
  │                                                              │
  │  Executive Summary · Anomaly Table · RCA · Remediation       │
  └──────────────────────────────────────────────────────────────┘

Stage 2 is the critical step. An LLM cannot reason reliably over thousands of raw packet rows. It can reason precisely over a compact statistical summary — which is why the Semantic JSON exists.


A Real Capture, Analyzed

uv run python pcap_forensics.py kitchen_sink.pcap

The capture: 134 packets, 14.9 seconds, four protocols present — ARP, ICMP, TCP, DNS.

Step 1: Semantic JSON

134 packets reduce to a compact JSON. The ARP section, verbatim from the actual output:

"arp": {
  "total_requests": 7,
  "total_replies": 9,
  "unanswered_requests": [{"ip": "10.0.0.99", "count": 3}],
  "gratuitous_arp_count": 1,
  "duplicate_ip_alerts": [
    {
      "ip": "10.0.0.5",
      "macs": ["aa:bb:cc:dd:ee:05", "ff:ee:dd:cc:bb:aa"],
      "sample_frames": [7, 10]
    }
  ]
}

One field surfaces the critical finding: IP 10.0.0.5 is claimed by two different MAC addresses. In the raw capture this is buried across 134 frames. In the Semantic JSON it is a structured alert — protocol, IP, both MACs, and the exact frames.

The TCP section captures anomalies at stream granularity:

{
  "stream_id": 1,
  "src": "10.0.0.1:40001",
  "dst": "10.0.0.2:80",
  "retransmissions": 5,
  "rst": true,
  "ack_rtt_ms": {"min": 1.0, "median": 210.0, "p95": 210.0, "max": 210.0},
  "sample_frames": [68, 71, 74, 77, 80]
}

Stream 0 on the same destination has a median ACK RTT of 5ms. Stream 1 is at 210ms — 42x worse — and ends with a RST. Both streams go to the same host. The difference is the signal.

Step 2: Forensic Report

The executive summary from the engine:

“The capture contains a critical security alert: multiple NXDOMAIN responses for domains matching a DGA pattern (“evil-c2.com”) indicate malware beaconing to command-and-control servers. Additionally, there is an IP-MAC conflict detected for IP address 10.0.0.5, signaling potential ARP spoofing. Host 10.0.0.99 is confirmed unreachable at both Layer 2 and Layer 3 — three unanswered ARP requests establish it is absent from the segment, and router 10.0.0.254 corroborates this with an ICMP Host Unreachable. Separately, TCP stream 1 is experiencing significant retransmissions to 10.0.0.2:80, and slow DNS queries are impacting resolution times.”

The anomaly table — exact output from the engine, ranked by severity:

Severity Protocol Issue Detail Frame(s)
CRITICAL DNS DGA Detection Multiple NXDOMAIN responses for domains under “evil-c2.com” (e.g., ohgi1jny.evil-c2.com, t7rpwh6n.evil-c2.com) suggest DGA malware activity. 104, 106, 108, 110, 112
CRITICAL ARP IP-MAC Conflict Multiple MAC addresses (aa:bb:cc:dd:ee:05, ff:ee:dd:cc:bb:aa) claim IP 10.0.0.5, indicating potential ARP spoofing. 7, 10
HIGH ICMP Host Unreachable Router 10.0.0.254 reports “Host Unreachable” (code 1) for 10.0.0.99; error sent to 10.0.0.1. Corroborates unanswered ARP: 10.0.0.99 is confirmed unreachable at both L2 and L3. 43
HIGH DNS SERVFAIL The domain “broken.internal” returns SERVFAIL, indicating a problem with the authoritative DNS server. 114
MEDIUM TCP Retransmissions TCP stream 1 (10.0.0.1:40001 → 10.0.0.2:80) experiences 5 retransmissions. 68, 71, 74, 77, 80
MEDIUM ICMP Elevated RTT ICMP RTT has a median of 5ms but a p95 of 300ms, indicating occasional significant latency spikes. 37, 39, 41
MEDIUM DNS Slow Queries Queries for slow0.remote.com, slow1.remote.com, and slow2.remote.com take 500ms each. 120, 122, 124
LOW ARP Unanswered ARP Requests 3 unanswered ARP requests for 10.0.0.99 N/A
LOW ICMP TTL Exceeded TTL Exceeded messages originate from 172.16.0.1, suggesting a potential routing issue. 46

Nine findings across four protocols, ranked by severity, every finding tied to specific frames. The remediation section that follows gives exact CLI commands for each — not generic guidance.


Cross-Protocol Correlation

ARP unanswered requests + ICMP Host Unreachable — two layers, one dead host

The ARP table shows 3 unanswered requests for 10.0.0.99. On its own, that is a LOW finding — the host might be powered off, on the wrong VLAN, or simply slow to respond.

The ICMP layer adds the routing-layer confirmation. The Semantic JSON records:

{
  "src": "10.0.0.254",
  "dst": "10.0.0.1",
  "code": 1,
  "code_meaning": "Host Unreachable",
  "unreachable_dst": "10.0.0.99"
}

This is standard ICMP Destination Unreachable behaviour: the router sends the error back to the original sender of the failed packet. dst (10.0.0.1) is the sender who receives the notification. unreachable_dst (10.0.0.99) is the host the router could not reach — extracted from the inner IP header embedded in the ICMP payload.

Now the picture is complete. 10.0.0.1 tried to reach 10.0.0.99. The switch could not resolve 10.0.0.99’s MAC (unanswered ARP). Router 10.0.0.254 also cannot forward traffic to 10.0.0.99. Both layers agree: the host is absent. That changes the LOW ARP finding into a confirmed infrastructure gap — something to act on, not just monitor.

DNS NXDOMAIN pattern → DGA malware detection

Five NXDOMAIN responses for subdomains under evil-c2.com: ohgi1jny, t7rpwh6n, nyfx4m8t, 1qojah9t, q6v9mh7f. The randomness of the subdomain strings is the signal — no human types these. Gemini identifies the Domain Generation Algorithm pattern and escalates to CRITICAL. A threshold rule would flag “5 NXDOMAINs.” The engine reads the subdomain entropy, names the threat class, and gives the remediation. From the actual report:

“Identify the infected host: examine DNS query logs on the DNS server (10.0.0.53) to find the source IP address making the NXDOMAIN queries to domains like ‘ohgi1jny.evil-c2.com’. Isolate the infected host: disconnect from the network immediately to prevent further communication with the C2 server.”

The output is not a list of statistics. It is a diagnosis.


Three Operating Modes

Single Capture — Forensic Report

Default mode. One .pcap or .cap file in, one Markdown report out.

uv run python pcap_forensics.py capture.pcap
Output: capture_forensic_report.md

Temporal Comparison — Before vs. After

Two captures from the same segment at different times. The engine computes per-metric deltas and classifies each as STABLE, NEW ISSUE, REGRESSION, or RESOLVED. The change summary table, exact output from the comparison report:

Protocol Metric Capture A Capture B Delta Assessment
ARP IP-MAC Conflicts 0 1 +1 NEW ISSUE
ICMP RTT Median (ms) 5.0 5.0 0 STABLE
TCP Retransmission Rate 0% 0% 0 STABLE
TCP Handshake Success Rate 100% 100% 0 STABLE
DNS Latency Median (ms) 15.0 15.0 0 STABLE

Everything stable — except a new ARP IP-MAC conflict that appeared between captures. Without the comparison, this would be invisible in the current-state capture alone.

uv run python pcap_forensics.py baseline.pcap --compare current.pcap --mode temporal
Output: baseline_vs_current_comparison.md

Endpoint Correlation — Source vs. Destination

Two simultaneous captures from both ends of a path. If a flow is present at the source but absent at the destination, the loss is in the path, not the endpoints — a finding no NSG or route table query can produce. Drop rate is calculated per matched flow: (source packets − destination packets) / source packets.

uv run python pcap_forensics.py source.pcap --compare dest.pcap --mode endpoint-correlation
Output: source_vs_dest_comparison.md

What It Detects

Protocol What the Engine Looks For
ARP IP-MAC conflicts (ARP spoofing / cache poisoning); unanswered requests (host down or unreachable); gratuitous ARP announcements (VRRP/HSRP failover, NIC teaming)
ICMP Fragmentation Needed (type 3 code 4) — path MTU constraint, correlated with TCP retransmissions on large segments to identify MTU mismatch; routing loops (TTL Exceeded repeatedly from the same source); ICMP Redirect — potential traffic interception signal
TCP Retransmission storms — scattered across many streams (network-wide loss) vs. concentrated on one stream (endpoint or application problem); 3 duplicate ACKs triggering Fast Retransmit (RFC 5681); zero-window stalls (application not reading fast enough — not a network problem); RST origin analysis (local endpoint teardown vs. middlebox injection); bufferbloat (ACK RTT p95/median > 10)
DNS DGA malware (high-entropy random NXDOMAIN subdomain patterns); DNS tunneling (TXT queries > 20% of total); resolver overload (unanswered query rate > 5%); latency outliers; SERVFAIL attribution

The default for any pattern the engine cannot classify confidently is to surface it at INFO severity. No signal is silently discarded.


What It Takes to Run

Prerequisites:

  • Python 3.12+ with uv
  • tshark — brew install wireshark or apt install tshark
  • A Gemini API key from aistudio.google.com — the free tier handles most captures

Configuration: Set GEMINI_API_KEY in a .env file in the project directory. No other configuration required for standalone use.

Run:

uv run python pcap_forensics.py your-capture.pcap

Cost per analysis: A typical analysis with Gemini 2.0 Flash costs under $0.02.


Standalone Tool, or Part of a Larger System

Standalone: Point it at any .pcap or .cap file — a tcpdump on a Linux VM, a Wireshark session on a laptop, a switch span port capture, or an Azure Network Watcher download. No cloud infrastructure required.

Integrated: Inside Ghost Agent, the engine is the final stage of an automated capture-and-analysis pipeline:

  Ghost Agent — control-plane analysis exhausted, wire-level evidence needed
        │
        ▼
  Cloud Orchestrator — creates Azure Network Watcher capture
        │  polls until Succeeded → downloads .cap blob from storage
        ▼
  PCAP Forensic Engine — runs against the downloaded file
        │  produces capture_forensic_report.md
        ▼
  Ghost Agent — reads report, incorporates wire-level findings into RCA

Ghost Agent does not call pcap_forensics.py directly — it invokes it through the Safety Shell, so the execution is logged, auditable, and classified by the same four-tier pipeline as every other command. The forensic report is auto-approved for reading (it is a file the system itself created), so Ghost Agent reads it without triggering a human approval prompt.


Conclusion

A packet capture is definitive evidence. It is also unusable without the expertise to read it. The PCAP Forensic Engine encodes that expertise — protocol failure mode knowledge, RFC semantics, cross-layer correlation — into a repeatable pipeline that produces a ranked, frame-cited forensic report from any capture file.

Run it standalone. Drop it into your automation. Or let Ghost Agent invoke it when the control plane runs out of answers.


GitHub: github.com/ranga-sampath/agentic-network-tools

Clone the repo. Run it against a capture. Read the report.