Lateral Movement Detection via Graph Analysis on VPC Flow Logs

Lateral Movement Graph Detection — GNN + PageRank on internal VPC Flow Logs — HACKFORLAB cover image

From the hunt desk. Flow-by-flow lateral-movement rules — “alert when a host opens SMB to N new destinations” — catch worms and miss operators. Every red team you have ever paid for already knows to spread the activity across hosts, protocols, and time. The detection you actually want sees the shape of the communication graph, not individual flows. This post is the graph-analytic playbook for MITRE ATT&CK TA0008 (Lateral Movement) and TA0007 (Discovery), including the full sub-technique coverage table further down the page and public adversary-emulation atomics / open-source adversary-emulation frameworks validation paths.

Mature attackers do not move laterally in a straight line. They pivot through multiple hosts, change protocols at each hop — SMB to WinRM to RDP to SSH — and time their movements to blend with normal background traffic. A single-hop detection rule, no matter how well tuned, will miss the kill chain. The pattern is only visible when you stop looking at flows individually and start looking at the shape of the communication graph as a whole.

This playbook walks through a detection pipeline that builds a directed, weighted graph from internal-to-internal VPC Flow Log records, baselines it over a 14-day window, and then uses PageRank anomaly scoring, Louvain community detection, and Graph Neural Network (GNN) anomaly scoring to surface compromised pivot hosts, broken segment isolation, and complete multi-hop kill chains — all within attacker-feasible time windows.

It is part of a five-post series on production-grade VPC Flow Log detection engineering. The companion posts cover adaptive C2 beacon detection with FFT, low-and-slow data exfiltration with Isolation Forest and LSTM, botnet coordination with clustering, and living-off-the-land kill chains with Markov models. Together, the five pipelines cover the network-side of the entire MITRE ATT&CK kill chain — from initial access through exfiltration — using telemetry you already collect.

Why Per-Hop Lateral Movement Rules Miss the Kill Chain

A typical SIEM rule for lateral movement looks something like: “alert when a single host opens SMB to more than five new internal destinations in one hour.” That rule works for noisy worms — WannaCry, NotPetya — and almost nothing else. Modern operators avoid it the same way they avoid every other threshold-based rule: by spreading the activity across hosts, protocols, and time.

The structural problem is that lateral movement is not a property of any single flow. It is a property of paths. A web server reaching an LDAP server is normal. An LDAP server reaching a database server is normal. A database server reaching a domain controller is normal. The sequence — web → LDAP → DB → DC inside a 30-minute window, with the same attacker session driving each hop — is not. No flow-level alert can see that sequence because each hop in isolation looks routine.

Graph analysis fixes this by treating the network as what it actually is: a directed weighted graph where nodes are hosts and edges are communications. Once the graph is in memory, three classes of detection become possible that flow-by-flow rules cannot achieve:

Centrality-anomaly detection — a host whose betweenness centrality or PageRank score spikes above its baseline is acting as a pivot. Even if every individual flow looks legitimate, the host has stepped into a structural role it never previously played.
Edge-novelty detection — an edge that never existed in the 14-day baseline and uses a known lateral-movement port carries a 10× weight multiplier. Combined with cross-community detection (the edge crosses a segmentation boundary), it is a near-perfect signal for early-stage lateral movement.
Path-traversal detection — temporal depth-first search from a centrality-anomalous node, restricted to 30-minute windows, reconstructs the actual multi-hop sequence. The output is not just “host X is suspicious” but “host X → host Y → host Z → host W within 22 minutes” — a kill chain ready for analyst triage.

Building the Communication Graph from VPC Flow Logs

The graph construction is straightforward. Nodes are unique internal IPs (both srcAddr and dstAddr in RFC 1918 space). Edges are directed and weighted by total flow count and total bytes for that (src, dst) pair over the analysis window. Edge attributes capture protocol distribution (set of destination ports), temporal distribution (how the edge’s flows are spread across the window), and TCP flag patterns.

The lateral-movement-relevant destination ports we focus on are:

SMB / NetBIOS: 445, 135, 139
WinRM: 5985, 5986
RDP: 3389
SSH / Telnet: 22, 23
Databases: 1433 (SQL Server), 3306 (MySQL), 5432 (Postgres), 27017 (Mongo), 6379 (Redis)

Filtering at the VPC Flow Log level to only these ports — combined with srcaddr LIKE '10.%' AND dstaddr LIKE '10.%' — typically reduces a daily flow log volume by 95–99%, which means even very large enterprises can keep the graph in memory on a single Lambda or Glue worker.

Baselining the Graph Over a 14-Day Window

You cannot detect anomalies without a baseline. The pipeline computes a per-node and per-edge baseline over a rolling 14-day window. Per-node we capture:

In-degree, out-degree — the simplest measures of how connected the host is.
Betweenness centrality — the proportion of shortest paths between all node pairs that pass through this node. High betweenness identifies the host as a “bridge.”
PageRank — recursive importance score; a host pointed to by many high-PageRank hosts is itself high-PageRank. Originally devised for web search (Page & Brin, 1998), the algorithm transfers cleanly to internal traffic.
Clustering coefficient — how interconnected a node’s neighbours are. Captures the local topology around each host.

Per-edge we capture the weight (flow count and byte count) and the temporal pattern (uniform across the window, bursty, or one-off). We also run the Louvain community-detection algorithm over the baseline to discover the network’s natural segmentation — even if the team has not formally documented its subnets. Each host gets a community label, and edges that cross community boundaries become high-interest under the anomaly model.

Anomaly Detection: Centrality Spikes and Novel Edges

Once the baseline is established, anomaly detection runs on each new 1-hour graph snapshot. The core comparison is a per-metric z-score:

z = (current_metric − μ_baseline) / σ_baseline

Three signals fire alerts on their own; a fourth signal is structural and earns a 10× multiplier:

PageRank anomaly: |PR_current(node) − μ_PR_baseline(node)| / σ_PR_baseline(node) > 3.0. A node whose PageRank suddenly triples — because newly compromised hosts are calling out to it — fires this alert.
Betweenness anomaly: same z-score threshold against the betweenness centrality baseline. Flags hosts that have suddenly become network bridges.
Out-degree explosion: a host opening connections to many new destinations it has never contacted before. Classic enumeration / scanning behaviour.
Cross-community novel edges: any edge that (a) is new, (b) uses a lateral-movement port, and (c) crosses a Louvain community boundary. These are flagged automatically and get the 10× weight multiplier when summed into the host’s lateral-movement risk score.

The combination matters. A host with anomalous PageRank and a new SMB edge into a different community is, in operational terms, almost certainly compromised. The same host with only one of those signals is plausibly explainable as a benign change (new service rolled out, planned admin work). Tuning the alert threshold on the combination rather than the individual signals is the secret to a tractable false-positive rate.

Attack-Path Reconstruction with Temporal DFS

Identifying an anomalous node is half the work. The other half is reconstructing the kill chain so an analyst can read it as a story. The pipeline runs a depth-first search from each anomalous node, but with a strict temporal constraint: the next hop must occur within 30 minutes of the previous hop, and only outbound edges are followed. The DFS terminates at one of three boundaries:

A node whose outbound activity stops (the dead end — typically the final target).
A node that reaches the public internet via an egress flow (the exfiltration hop).
The 30-minute timeout (the chain stalled or the analyst is too late).

The reconstructed path is scored:

path_score = hop_count × protocol_diversity × cross_community_count

A four-hop chain that traverses SMB, WinRM, RDP, and HTTPS — and crosses three Louvain communities along the way — scores enormously higher than a four-hop chain that stays within one community on a single protocol. The scoring function deliberately rewards exactly the behaviour real operators exhibit when they pivot toward sensitive assets.

Feature Engineering from VPC Flow Logs

Feature	Source attributes	Formula	What it captures
Out-degree delta	srcAddr, dstAddr	current_unique_dst − baseline_avg_dst	New connections from a host — first signal of enumeration
Betweenness centrality	srcAddr, dstAddr	σ(s,t\|v) / σ(s,t) across all pairs	Pivot-point detection
PageRank anomaly	srcAddr, dstAddr, bytes	PR(current) − PR(baseline)	Structural importance shift
Protocol diversity	dstPort per edge	COUNT(DISTINCT dstPort) per src → dst	Multi-protocol lateral movement
Cross-segment flag	srcAddr, dstAddr, subnet_id	IF src_subnet ≠ dst_subnet THEN 1 ELSE 0	Segmentation boundary crossing
Temporal chain score	start, srcAddr, dstAddr	Σ (1 / time_gap) for sequential hops	Fast hop sequences
TCP flag entropy	tcp_flags	entropy(tcp_flags distribution)	Unusual handshake patterns

All seven features come straight from standard VPC Flow Logs (v3+ for subnet_id and tcp_flags; if you are still on v2, enable v3 today). The pipeline does not need any external enrichment to function — though pairing the output with identity context (which IAM principal owns which instance) makes triage dramatically faster.

Athena SQL — Edge Extraction for the Graph Pipeline

Athena handles the heavy filtering and aggregation. The query below extracts candidate lateral-movement edges — flows on lateral-movement ports between internal hosts — and aggregates them into edges with attribute summaries ready for the graph layer.

WITH internal_flows AS (
    SELECT srcaddr, dstaddr, dstport, bytes, packets, start, tcp_flags,
           subnet_id, interface_id, instance_id,
           COUNT(*) OVER (PARTITION BY srcaddr) AS src_out_degree,
           COUNT(*) OVER (PARTITION BY dstaddr) AS dst_in_degree
    FROM central_vpc_flow_logs
    WHERE action = 'ACCEPT'
      AND srcaddr LIKE '10.%' AND dstaddr LIKE '10.%'
      AND dstport IN (445, 135, 139, 5985, 5986, 3389, 22, 23, 1433, 3306, 5432, 27017, 6379)
      AND day BETWEEN '2026/03/19' AND '2026/03/23'
),
edge_summary AS (
    SELECT srcaddr, dstaddr,
           COUNT(*)                  AS flow_count,
           COUNT(DISTINCT dstport)   AS protocol_diversity,
           SUM(bytes)                AS total_bytes,
           array_agg(DISTINCT dstport) AS ports_used,
           MIN(start)                AS first_contact,
           MAX(start)                AS last_contact,
           COUNT(DISTINCT instance_id) AS instances_involved
    FROM internal_flows
    GROUP BY srcaddr, dstaddr
)
SELECT *,
       flow_count * protocol_diversity AS lateral_risk_score
FROM edge_summary
WHERE protocol_diversity >= 2 OR flow_count > 50
ORDER BY lateral_risk_score DESC;

A few notes on tuning:

The WHERE clause filters to RFC 1918 internal-to-internal flows on lateral-movement ports. If your private space includes the 100.64/10 carrier-grade NAT range or 172.16/12, extend the filter.
The final HAVING clause is the single tunable parameter that controls noise — protocol_diversity >= 2 is the strongest filter, because legitimate workload-to-workload communication very rarely spans more than one protocol per src → dst pair. flow_count > 50 catches the high-volume legitimate edges that we want in the graph anyway (and which form the baseline for centrality metrics).
Output volume: a mid-size enterprise typically produces 5,000–50,000 candidate edges per day after this filter — small enough for any graph library (NetworkX, PyTorch Geometric, DGL) to handle in seconds.

The Lateral Movement Risk Score

Once edges are loaded into the graph and centralities are computed, the final risk score for each candidate sequence is:

Lateral Movement Score = Σ (edge_weight × protocol_diversity × cross_segment_penalty)

cross_segment_penalty = 3.0  if src_subnet ≠ dst_subnet
                       = 1.0  otherwise

The score is computed per reconstructed path, not per node. A high-scoring path is a kill chain — it carries the weight of multiple edges, the diversity of multiple protocols, and the multiplied penalty of every segment boundary crossed. In our experience the alerting threshold lands somewhere between 50 and 200 depending on enterprise size; tune against a four-week historical backtest against known clean traffic.

Putting It Into Production

The end-to-end architecture is intentionally lightweight:

VPC Flow Logs → S3 (Parquet partitioned by date). If you have not enabled this, our VPC Flow Logs hunting primer covers the setup.
EventBridge → daily Lambda kicks off the Athena query above at 03:00 local time. Output lands in a separate S3 prefix.
Glue / Spark / SageMaker job loads the edges into a NetworkX or PyTorch Geometric graph, computes centralities, runs Louvain, and produces per-host and per-edge anomaly scores.
Anomaly scores → SNS/Kinesis → SIEM, with the reconstructed path attached as a structured field.
14-day baseline refresh runs nightly with a sliding window. Older data is aged out so the baseline stays current with planned environment changes.

For the GNN variant of the pipeline — useful when you have labelled incidents to train on — PyTorch Geometric or DGL provides everything needed. We start with a 2-layer GraphSAGE classifier over node features (centralities + role tags) and edge features (protocol diversity + temporal pattern), trained on a few hundred labelled incidents from past breach disclosures and your own red-team exercises. The unsupervised pipeline (z-scores + Louvain) works without any labels, which is where most teams start.

Limits and False-Positive Sources

Real networks have legitimate centrality concentrations. Common sources of false positives:

Domain controllers, DNS resolvers, and centralised log collectors are designed to be high-betweenness, high-PageRank nodes. They produce constant baseline anomalies until allow-listed.
Backup servers hit every host on a schedule and look like distributed lateral movement to the algorithm.
Vulnerability scanners (a vulnerability scanner, a vulnerability scanner, OpenVAS, a vulnerability scanner) deliberately do exactly what the pipeline alerts on. Maintain an allow-list of scanner subnets.
Configuration management agents (Ansible push, Puppet master, Chef server) reach into every host on a schedule.
Newly deployed services create new edges that look novel to the 14-day baseline. The pipeline will alert until the baseline catches up; use a service-deployment notification channel to pre-suppress.

The cleanest operational pattern is a maintained allow-list of “known structural hubs” — DCs, DNS, scanner subnets, backup endpoints — combined with role-based suppression for newly deployed services during the first 14 days after deployment.

MITRE ATT&CK Techniques Covered by This Detection

This pipeline targets the Lateral Movement (TA0008) and Discovery (TA0007) tactics, with adjacent coverage of Credential Access (TA0006). The graph-anomaly signal fires on the structural traces an operator leaves behind during hands-on-keyboard movement, regardless of which specific tool they used. The table is your purple-team coverage worksheet.

ATT&CK ID	Technique / sub-technique	Coverage	Hunter notes
T1021	Remote Services (parent)	Full	Core surface — every sub-technique below produces graph-level evidence
T1021.001	Remote Desktop Protocol (RDP)	Full	Multi-hop RDP is a classic chain — DFS reconstruction trivial
T1021.002	SMB / Windows Admin Shares	Full	Port 445/135/139 carry the highest edge weight in the model
T1021.004	SSH	Full	Internal-to-internal SSH on port 22 is rare and high-signal
T1021.006	Windows Remote Management (WinRM)	Full	5985/5986 — pairs naturally with PowerShell remoting hunts
T1018	Remote System Discovery	Full	Out-degree explosion is the canonical scan signature
T1046	Network Service Discovery	Full	Multi-port probing surfaces in protocol_diversity feature
T1570	Lateral Tool Transfer	Full	Byte-weight on lateral edges spikes when payloads move
T1210	Exploitation of Remote Services	Partial	Surface — exploit success is what creates the new edge; pre-exploit recon also surfaces
T1550.002	Pass the Hash	Partial	Network footprint identical to legitimate SMB — pair with auth-event hunts
T1550.003	Pass the Ticket	Partial	—
T1558	Steal or Forge Kerberos Tickets	Partial	Kerberoasting traffic (port 88) flagged when paired with anomalous LDAP enumeration
T1558.003	Kerberoasting	Partial	—
T1078	Valid Accounts	Partial	The hardest case — legitimate creds, legitimate ports, abnormal graph position
T1059.001	Command and Scripting Interpreter: PowerShell	Partial	WinRM channel covered at the network level; payload analysis needs EDR
T1047	Windows Management Instrumentation	Partial	WMI over DCOM (port 135) surfaces in port_diversity
T1219	Remote Access Software (legitimate RMM abuse)	Partial	AnyDesk / commercial remote-access tools / RustDesk surface via destination-port anomaly
T1572	Protocol Tunneling	Out of scope	SSH/ICMP tunnels need post #3’s covert-channel hunt

Adversary emulation / purple-team validation. The high-value public adversary-emulation atomics tests for this detection are T1021.002 (SMB), T1021.006 (WinRM), and T1046 (service discovery). For a realistic multi-stage chain, run the open-source adversary-emulation frameworks “discovery-and-lateral-movement” operation profile against a 3-tier lab segment. The graph anomaly score should peak as soon as the operator crosses the second segment boundary.

Sigma / detection-as-code. Output your graph anomaly events into the SIEM as structured fields — graph_pagerank_z, graph_betweenness_z, lateral_risk_score, attack_path — and write the Sigma rule as a simple threshold check. This separation keeps the heavy graph maths in Lambda/SageMaker and the alert logic in code review.

D3FEND mappings. The pipeline implements D3-NTCD (Network Traffic Community Deviation) directly via Louvain community analysis, and D3-NTA (Network Traffic Analysis) as the umbrella defensive technique. Useful framing when you justify the investment to your CISO.

Where This Sits in a Mature Threat Hunting Programme

Graph-based lateral-movement detection pairs naturally with the other four detections in this VPC-Flow-Log series and with the broader hunt patterns already on HACKFORLAB:

Adaptive C2 beacon detection (FFT + DBSCAN) — the initial-access side.
VPC Flow Log attack hunting — volumetric and access-pattern hunts.
Outbound network threat hunting — destination enrichment.
Cloud attack threat hunting — identity and resource-level evidence.
Hunting AWS identity attacks — for the IAM side of the pivot.
AWS Bedrock CloudTrail playbook — for the GenAI service surface.
Authentication-event threat hunting — for the auth-side correlation.
Linux threat hunting with CUT, SORT, UNIQ, DIFF — for the host-side investigation once a lateral-movement alert lands.

Closing Thoughts

If your SOC is still hunting lateral movement one flow at a time, you are missing the structural signal that real attackers cannot hide. Graph methods are mature, the libraries are free, and the SQL above is the only piece of plumbing you need to drop the analytical layer in front of your existing VPC Flow Logs. The investment is two engineer-days; the payoff is detection coverage for an entire kill-chain phase that most enterprises currently miss.

Tune the parameters against your environment. Backtest against your own incidents. Send us your war stories. Happy threat hunting.

#threathunting #lateralmovement #vpcflowlogs #awssecurity #cloudsecurity #graphneuralnetwork #pagerank #louvain #communitydetection #mitreattack #soc #blueteam #networkdetection #anomalydetection #cyberdefense #infosec #ml #detectionengineering

Forensics and Cyber Threat Research Area

Leave a Reply Cancel reply

Indicator of Attacks | Indicator of Compromise

Recent Posts

Hackforlab Category

FaceBook Page

SIEM | UEBA

GridView List Posts Widget

Weekly Threat Advisory: Beyond Ransomware — 11 RATs, 7 APTs, 1 WIPER, HASH Still Leads (Jul 6 – 12, 2026)

Weekly Threat Advisory: 5 APTs, 200 RATs, 74% High-Severity — The Week the C2 Flood Went Quiet (Jun 29 – Jul 5, 2026)

Weekly Threat Advisory: APT Surge, Ransomware Full-Pivot, Messaging Weaponised — June 22-28, 2026

Indicators of Compromise and Threat Intelligence: A Practitioner Reference

Weekly Threat Advisory: Cluster Analysis & Top IOCs, June 15 – 21, 2026

Cyber Threat Attacks / Hunting

Cyber Deception

FOLLOW US

CYBER THREAT CATEGORIES

Top Cyber Security Articles

Threat Hunting Scenarios