Kubernetes East-West Attack Hunting from VPC Flow Logs — Pod-to-pod attack detection with namespace and service-mesh awareness

Kubernetes East-West Attack Hunting from VPC Flow Logs

Kubernetes East-West Attack Hunting from VPC Flow Logs — Pod-to-pod attack detection with namespace and service-mesh awareness — HACKFORLAB cover

From the hunt desk. Kubernetes broke network detection. Pre-cluster, every server had a known IP, a known role, and an SOC that could write rules against the destination. Then we shipped EKS and AKS and GKE, ephemeral pods got recycled every twelve minutes, and every internal IP suddenly meant something completely different depending on the hour. Most SIEMs in production today essentially ignore east-west traffic inside Kubernetes clusters because the IP addresses are not stable enough to write rules against. Attackers know this. They land in one pod, move sideways through three more, exfil through a fourth, and leave before the next deployment cycle erases the evidence.

This post is the playbook for getting east-west detection back on the table. The core trick is joining VPC Flow Logs to the Kubernetes API metadata at query time — turning ephemeral IPs into stable identifiers (namespace, service account, deployment, pod-template-hash). Once that join is in place, every flow becomes attributable to a workload, and standard anomaly detection (community detection, edge-novelty, role deviation) works again. The pipeline below does exactly that, end-to-end, for EKS clusters on AWS.

This is post #10 in our VPC Flow Log detection-engineering series. Companion posts: lateral movement graph detection (the underlying technique), FFT C2 beacon detection, and LotL Markov kill chains.

Why Kubernetes Network Detection is Different

Three structural differences from traditional VPC network detection:

  • Pod IPs are ephemeral. A pod’s IP is assigned at scheduling time and released when the pod terminates. A rolling deploy of a 200-pod service rotates 200 IPs in five minutes. Anything stored as (srcaddr, dstaddr) is stale before the alert fires.
  • Service IPs add another indirection. A flow to a ClusterIP service is load-balanced across the service’s endpoints — the destination IP in the VPC Flow Log is the pod’s eni, but the workload-relevant target is the service.
  • Network policies are aspirational. Most clusters in production have no NetworkPolicy resources, or they have a “default deny” policy at the namespace level and nothing more specific. The traffic itself is the only ground truth.

The good news: AWS VPC Flow Logs in EKS clusters with the AWS VPC CNI plugin assign one ENI per pod, and the ENI carries tags that map back to the pod’s namespace, service-account, and pod-template-hash. The metadata is there. It just needs to be joined to the flow at query time.

The Detection Pipeline

Kubernetes east-west detection pipeline — five-step architecture from VPC Flow Logs through pod-IP mapping to cross-namespace anomaly scoring
  1. Ingest. VPC Flow Logs from the cluster’s VPC land in S3, Parquet, partitioned by day. Critically, use the v3+ flow log format which includes instance_id (the ENI ID) and tcp-flags.
  2. Pod-IP to namespace mapping. A separate Lambda runs every 5 minutes against the EKS cluster’s Kubernetes API and exports a (pod_ip, namespace, service_account, deployment, pod_template_hash, node) table to S3. This is your translation table. It only needs to be updated as fast as pods cycle.
  3. Service-mesh graph baseline. Build a graph where nodes are deployments (not pods) and edges are accumulated flows between deployments. The deployment is the stable identity; the pod IPs underneath are ephemeral. Over 14 days, the graph stabilises into the actual service-call topology of the application.
  4. Cross-namespace anomaly score. Two complementary signals: (a) edge novelty — a new edge between deployments that has never existed in the baseline, especially when crossing a namespace boundary; (b) centrality deviation — a deployment whose PageRank or betweenness centrality changes significantly, indicating it has stepped into a new structural role.
  5. SecOps + platform-eng alert. Alerts route to both the SOC and the platform engineering team. Kubernetes detection requires platform-eng context to triage — the platform team knows which new edges are legitimate (a new microservice rollout) and which aren’t.

Building the Pod-IP Translation Table

The cleanest implementation: a Kubernetes CronJob on the cluster runs every 5 minutes, queries the API for all pods across all namespaces, and writes the result to S3 as Parquet. Schema:

pod_ip          STRING   # primary key
namespace       STRING
service_account STRING
deployment      STRING
pod_template_hash STRING
pod_name        STRING
node_name       STRING
labels          MAP<STRING, STRING>   # full label set
created_at      TIMESTAMP
status          STRING
sampled_at      TIMESTAMP   # when this snapshot was taken

Athena joins this table to VPC Flow Logs on flow.srcaddr = pod_map.pod_ip AND flow.timestamp BETWEEN pod_map.sampled_at AND pod_map.sampled_at + INTERVAL '5 minutes' to ensure the right snapshot is matched. The result is a flow record decorated with both source and destination workload identity.

For services with stable cluster IPs, a parallel table maps service IP to service name, namespace, and selected pods. Most analyses operate on the deployment level though — services are too coarse and pods are too fine.

Athena SQL — Cross-Namespace Edge Hunt

WITH labeled_flows AS (
    SELECT f.start, f.bytes, f.packets, f.dstport,
           src.namespace      AS src_ns,
           src.deployment     AS src_dep,
           src.service_account AS src_sa,
           dst.namespace      AS dst_ns,
           dst.deployment     AS dst_dep,
           dst.service_account AS dst_sa
    FROM central_vpc_flow_logs f
    LEFT JOIN pod_map src ON f.srcaddr = src.pod_ip
                          AND f.start BETWEEN src.sampled_at AND src.sampled_at + INTERVAL '5' MINUTE
    LEFT JOIN pod_map dst ON f.dstaddr = dst.pod_ip
                          AND f.start BETWEEN dst.sampled_at AND dst.sampled_at + INTERVAL '5' MINUTE
    WHERE f.action = 'ACCEPT'
      AND src.namespace IS NOT NULL
      AND dst.namespace IS NOT NULL
      AND f.day BETWEEN '2026/05/09' AND '2026/05/15'
),
edge_summary AS (
    SELECT src_dep, dst_dep, src_ns, dst_ns,
           COUNT(*)             AS flow_count,
           SUM(bytes)           AS total_bytes,
           COUNT(DISTINCT dstport) AS port_diversity,
           MIN(start)           AS first_seen,
           MAX(start)           AS last_seen,
           (src_ns <> dst_ns)   AS cross_namespace
    FROM labeled_flows
    GROUP BY src_dep, dst_dep, src_ns, dst_ns
)
SELECT *,
       (CASE WHEN cross_namespace THEN 3.0 ELSE 1.0 END) * flow_count * port_diversity AS east_west_risk_score
FROM edge_summary
WHERE first_seen > '2026-05-14'    -- new in the last 24 hours
  AND (port_diversity >= 2 OR flow_count > 100)
ORDER BY east_west_risk_score DESC;

The result is your cross-namespace investigation queue: new edges between deployments, weighted by whether they cross a namespace boundary and how diverse the protocol set is. A cross-namespace edge with two protocols and 200 flows from a deployment that has never previously talked to the destination namespace is almost certainly worth looking at.

Specific Attack Patterns to Hunt

  • Pod escape → host-network access. A compromised pod that escapes its container will start communicating from the host’s IP, not the pod’s IP. Detection: source IP that is a worker node (not in pod_map) initiates internal traffic to other pods.
  • Service-account token theft. A pod uses a stolen SA token to query the Kubernetes API. Detection: flow from a non-system pod to the kube-apiserver (typically 10.100.0.1:443 in EKS) when the pod’s deployment is not labelled with API-using behaviour.
  • Cross-namespace lateral movement. The classic — compromised pod in default namespace starts talking to kube-system or production namespaces. Caught directly by the edge-novelty + cross_namespace_penalty logic.
  • Crypto-mining inside a pod. Sustained outbound Stratum-like traffic from a pod. Pair this hunt with the cloud cryptojacking detection pipeline.
  • Sidecar abuse (Istio / Linkerd). Service-mesh sidecars normally proxy all traffic, but a compromised sidecar can attack neighbouring pods. Detection: pod-to-pod traffic that bypasses the expected sidecar (port 15001 / 15006 for Istio, 4143 for Linkerd).
  • NodePort abuse. External traffic reaching internal services via a misconfigured NodePort. Detection: external source addresses reaching pod IPs without going through the expected LoadBalancer ENI.

Feature Engineering

Feature Source Formula / method What it captures
Deployment-level edge novelty VPC Flow + pod_map edge not in 14-day baseline New service-to-service comms
Cross-namespace flag pod_map join src_ns != dst_ns boolean Namespace boundary crossing
Service-account anomaly pod_map + CloudTrail (IRSA) SA-token usage outside normal scope Token abuse
Sidecar bypass VPC Flow (ports) traffic skipping 15001/15006 in mesh-tagged pods Mesh-aware detection
Pod-to-apiserver flag VPC Flow (dst=apiserver IP) boolean per pod Kubernetes API misuse
Node IP source VPC Flow + ENI tags flow from worker-node IP rather than pod IP Container escape signature
External to NodePort VPC Flow non-internal source → NodePort range (30000-32767) NodePort abuse
Cluster-IP rarity service_map baseline service called by deployment not in 14d baseline Service-mesh anomaly

Practical Considerations for EKS Operators

  • VPC CNI vs alternative CNIs. The AWS VPC CNI assigns one ENI per pod, which makes VPC Flow Logs natively per-pod. Cilium with kube-proxy replacement, Calico in native mode, and other CNIs use overlay networks that hide pod identity from VPC Flow Logs. For non-VPC-CNI clusters, you need eBPF flow capture (Cilium Hubble, Falco) instead — the detection logic is similar but the data source changes.
  • Sample rate. Some accounts enable VPC Flow Logs at 1:1000 sampling for cost reasons. East-west detection at that sample rate is largely useless; the detection requires full capture or at least 1:10 sampling.
  • Log delivery latency. VPC Flow Logs have ~10-minute delivery latency. Real-time detection for ephemeral pods requires eBPF-based capture for the critical paths and Flow-Log-based detection for the broad sweep.
  • Multi-cluster. Each cluster needs its own pod_map. The Athena schema should include a cluster_name column so a single query covers the whole fleet.

MITRE ATT&CK Techniques Covered (Kubernetes Matrix)

The Kubernetes-specific ATT&CK matrix overlaps with the standard enterprise matrix but has its own technique IDs for cluster-specific tactics.

ATT&CK ID Technique / sub-technique Coverage Hunter notes
T1611 Escape to Host Partial Worker-node-IP detection on egress; pair with EDR for full coverage
T1613 Container and Resource Discovery Full Pod-to-apiserver enumeration visible
T1610 Deploy Container Partial Pair with CloudTrail RunTask / EKS audit
T1525 Implant Internal Image Out of scope Image-side; pair with ECR scanning
T1552.004 Unsecured Credentials: Private Keys (SA token theft) Partial
T1552.007 Container API Full Pod-to-apiserver flows
T1021 Remote Services (pod-to-pod) Full Cross-namespace SMB/SSH/RDP-equivalents
T1046 Network Service Discovery Full Pod-port-scanning behaviour
T1018 Remote System Discovery Full Out-degree explosion
T1496 Resource Hijacking Full Cryptojacking pods — see post #8
T1071 Application Layer Protocol Full
T1041 Exfiltration Over C2 Channel Full Pod-as-pivot exfiltration

Adversary emulation. The open-source Kubernetes attack frameworks framework lets you emulate pod-escape and cross-namespace attacks safely in a lab cluster. open-source Kubernetes attack utilities emulates service-account-token theft. public adversary-emulation atomics also has Kubernetes-specific atomics under the T1611 / T1613 / T1610 series. Run them in a lab namespace and confirm the pipeline scores them.

D3FEND mapping. D3-NTA with Kubernetes scope.

Where This Sits in a Mature Threat Hunting Programme

Closing Thoughts

The core insight of this post is that Kubernetes network detection is solvable with the same techniques that already work outside Kubernetes — once you stop treating pod IPs as identifiers and start treating deployments as identifiers. The translation table is the missing piece. Build it, join it, and the entire graph-anomaly detection toolkit becomes available on the most ephemeral workloads in your environment. The SOC stops being blind to east-west traffic, and platform engineering stops being the only team that knows what’s happening inside the cluster.

Happy threat hunting.

#threathunting #kubernetes #eks #containersecurity #vpcflowlogs #awssecurity #podsecurity #servicemesh #soc #blueteam #detectionengineering #mitreattack

Core Working Areas :- Threat Intelligence, Digital Forensics, Incident Response, Fraud Investigation, Web Application Security Technical Certifications :- Computer Hacking Forensics Investigator | Certified Ethical Hacker | Certified Cyber crime investigator | Certified Professional Hacker | Certified Professional Forensics Analyst | Redhat certified Engineer | Cisco Certified Network Associates | Certified Firewall Solutions | Certified Network Monitoring Solution | Certified Proxy Solutions

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter Captcha Here : *

Reload Image