
From the hunt desk. Kubernetes broke network detection. Pre-cluster, every server had a known IP, a known role, and an SOC that could write rules against the destination. Then we shipped EKS and AKS and GKE, ephemeral pods got recycled every twelve minutes, and every internal IP suddenly meant something completely different depending on the hour. Most SIEMs in production today essentially ignore east-west traffic inside Kubernetes clusters because the IP addresses are not stable enough to write rules against. Attackers know this. They land in one pod, move sideways through three more, exfil through a fourth, and leave before the next deployment cycle erases the evidence.
This post is the playbook for getting east-west detection back on the table. The core trick is joining VPC Flow Logs to the Kubernetes API metadata at query time — turning ephemeral IPs into stable identifiers (namespace, service account, deployment, pod-template-hash). Once that join is in place, every flow becomes attributable to a workload, and standard anomaly detection (community detection, edge-novelty, role deviation) works again. The pipeline below does exactly that, end-to-end, for EKS clusters on AWS.
This is post #10 in our VPC Flow Log detection-engineering series. Companion posts: lateral movement graph detection (the underlying technique), FFT C2 beacon detection, and LotL Markov kill chains.
Why Kubernetes Network Detection is Different
Three structural differences from traditional VPC network detection:
- Pod IPs are ephemeral. A pod’s IP is assigned at scheduling time and released when the pod terminates. A rolling deploy of a 200-pod service rotates 200 IPs in five minutes. Anything stored as
(srcaddr, dstaddr)is stale before the alert fires. - Service IPs add another indirection. A flow to a ClusterIP service is load-balanced across the service’s endpoints — the destination IP in the VPC Flow Log is the pod’s eni, but the workload-relevant target is the service.
- Network policies are aspirational. Most clusters in production have no NetworkPolicy resources, or they have a “default deny” policy at the namespace level and nothing more specific. The traffic itself is the only ground truth.
The good news: AWS VPC Flow Logs in EKS clusters with the AWS VPC CNI plugin assign one ENI per pod, and the ENI carries tags that map back to the pod’s namespace, service-account, and pod-template-hash. The metadata is there. It just needs to be joined to the flow at query time.
The Detection Pipeline

- Ingest. VPC Flow Logs from the cluster’s VPC land in S3, Parquet, partitioned by day. Critically, use the v3+ flow log format which includes
instance_id(the ENI ID) andtcp-flags. - Pod-IP to namespace mapping. A separate Lambda runs every 5 minutes against the EKS cluster’s Kubernetes API and exports a (pod_ip, namespace, service_account, deployment, pod_template_hash, node) table to S3. This is your translation table. It only needs to be updated as fast as pods cycle.
- Service-mesh graph baseline. Build a graph where nodes are deployments (not pods) and edges are accumulated flows between deployments. The deployment is the stable identity; the pod IPs underneath are ephemeral. Over 14 days, the graph stabilises into the actual service-call topology of the application.
- Cross-namespace anomaly score. Two complementary signals: (a) edge novelty — a new edge between deployments that has never existed in the baseline, especially when crossing a namespace boundary; (b) centrality deviation — a deployment whose PageRank or betweenness centrality changes significantly, indicating it has stepped into a new structural role.
- SecOps + platform-eng alert. Alerts route to both the SOC and the platform engineering team. Kubernetes detection requires platform-eng context to triage — the platform team knows which new edges are legitimate (a new microservice rollout) and which aren’t.
Building the Pod-IP Translation Table
The cleanest implementation: a Kubernetes CronJob on the cluster runs every 5 minutes, queries the API for all pods across all namespaces, and writes the result to S3 as Parquet. Schema:
pod_ip STRING # primary key
namespace STRING
service_account STRING
deployment STRING
pod_template_hash STRING
pod_name STRING
node_name STRING
labels MAP<STRING, STRING> # full label set
created_at TIMESTAMP
status STRING
sampled_at TIMESTAMP # when this snapshot was taken
Athena joins this table to VPC Flow Logs on flow.srcaddr = pod_map.pod_ip AND flow.timestamp BETWEEN pod_map.sampled_at AND pod_map.sampled_at + INTERVAL '5 minutes' to ensure the right snapshot is matched. The result is a flow record decorated with both source and destination workload identity.
For services with stable cluster IPs, a parallel table maps service IP to service name, namespace, and selected pods. Most analyses operate on the deployment level though — services are too coarse and pods are too fine.
Athena SQL — Cross-Namespace Edge Hunt
WITH labeled_flows AS (
SELECT f.start, f.bytes, f.packets, f.dstport,
src.namespace AS src_ns,
src.deployment AS src_dep,
src.service_account AS src_sa,
dst.namespace AS dst_ns,
dst.deployment AS dst_dep,
dst.service_account AS dst_sa
FROM central_vpc_flow_logs f
LEFT JOIN pod_map src ON f.srcaddr = src.pod_ip
AND f.start BETWEEN src.sampled_at AND src.sampled_at + INTERVAL '5' MINUTE
LEFT JOIN pod_map dst ON f.dstaddr = dst.pod_ip
AND f.start BETWEEN dst.sampled_at AND dst.sampled_at + INTERVAL '5' MINUTE
WHERE f.action = 'ACCEPT'
AND src.namespace IS NOT NULL
AND dst.namespace IS NOT NULL
AND f.day BETWEEN '2026/05/09' AND '2026/05/15'
),
edge_summary AS (
SELECT src_dep, dst_dep, src_ns, dst_ns,
COUNT(*) AS flow_count,
SUM(bytes) AS total_bytes,
COUNT(DISTINCT dstport) AS port_diversity,
MIN(start) AS first_seen,
MAX(start) AS last_seen,
(src_ns <> dst_ns) AS cross_namespace
FROM labeled_flows
GROUP BY src_dep, dst_dep, src_ns, dst_ns
)
SELECT *,
(CASE WHEN cross_namespace THEN 3.0 ELSE 1.0 END) * flow_count * port_diversity AS east_west_risk_score
FROM edge_summary
WHERE first_seen > '2026-05-14' -- new in the last 24 hours
AND (port_diversity >= 2 OR flow_count > 100)
ORDER BY east_west_risk_score DESC;
The result is your cross-namespace investigation queue: new edges between deployments, weighted by whether they cross a namespace boundary and how diverse the protocol set is. A cross-namespace edge with two protocols and 200 flows from a deployment that has never previously talked to the destination namespace is almost certainly worth looking at.
Specific Attack Patterns to Hunt
- Pod escape → host-network access. A compromised pod that escapes its container will start communicating from the host’s IP, not the pod’s IP. Detection: source IP that is a worker node (not in pod_map) initiates internal traffic to other pods.
- Service-account token theft. A pod uses a stolen SA token to query the Kubernetes API. Detection: flow from a non-system pod to the kube-apiserver (typically 10.100.0.1:443 in EKS) when the pod’s deployment is not labelled with API-using behaviour.
- Cross-namespace lateral movement. The classic — compromised pod in
defaultnamespace starts talking tokube-systemor production namespaces. Caught directly by the edge-novelty + cross_namespace_penalty logic. - Crypto-mining inside a pod. Sustained outbound Stratum-like traffic from a pod. Pair this hunt with the cloud cryptojacking detection pipeline.
- Sidecar abuse (Istio / Linkerd). Service-mesh sidecars normally proxy all traffic, but a compromised sidecar can attack neighbouring pods. Detection: pod-to-pod traffic that bypasses the expected sidecar (port 15001 / 15006 for Istio, 4143 for Linkerd).
- NodePort abuse. External traffic reaching internal services via a misconfigured NodePort. Detection: external source addresses reaching pod IPs without going through the expected LoadBalancer ENI.
Feature Engineering
| Feature | Source | Formula / method | What it captures |
|---|---|---|---|
| Deployment-level edge novelty | VPC Flow + pod_map | edge not in 14-day baseline | New service-to-service comms |
| Cross-namespace flag | pod_map join | src_ns != dst_ns boolean | Namespace boundary crossing |
| Service-account anomaly | pod_map + CloudTrail (IRSA) | SA-token usage outside normal scope | Token abuse |
| Sidecar bypass | VPC Flow (ports) | traffic skipping 15001/15006 in mesh-tagged pods | Mesh-aware detection |
| Pod-to-apiserver flag | VPC Flow (dst=apiserver IP) | boolean per pod | Kubernetes API misuse |
| Node IP source | VPC Flow + ENI tags | flow from worker-node IP rather than pod IP | Container escape signature |
| External to NodePort | VPC Flow | non-internal source → NodePort range (30000-32767) | NodePort abuse |
| Cluster-IP rarity | service_map baseline | service called by deployment not in 14d baseline | Service-mesh anomaly |
Practical Considerations for EKS Operators
- VPC CNI vs alternative CNIs. The AWS VPC CNI assigns one ENI per pod, which makes VPC Flow Logs natively per-pod. Cilium with kube-proxy replacement, Calico in native mode, and other CNIs use overlay networks that hide pod identity from VPC Flow Logs. For non-VPC-CNI clusters, you need eBPF flow capture (Cilium Hubble, Falco) instead — the detection logic is similar but the data source changes.
- Sample rate. Some accounts enable VPC Flow Logs at 1:1000 sampling for cost reasons. East-west detection at that sample rate is largely useless; the detection requires full capture or at least 1:10 sampling.
- Log delivery latency. VPC Flow Logs have ~10-minute delivery latency. Real-time detection for ephemeral pods requires eBPF-based capture for the critical paths and Flow-Log-based detection for the broad sweep.
- Multi-cluster. Each cluster needs its own pod_map. The Athena schema should include a
cluster_namecolumn so a single query covers the whole fleet.
MITRE ATT&CK Techniques Covered (Kubernetes Matrix)
The Kubernetes-specific ATT&CK matrix overlaps with the standard enterprise matrix but has its own technique IDs for cluster-specific tactics.
| ATT&CK ID | Technique / sub-technique | Coverage | Hunter notes |
|---|---|---|---|
| T1611 | Escape to Host | Partial | Worker-node-IP detection on egress; pair with EDR for full coverage |
| T1613 | Container and Resource Discovery | Full | Pod-to-apiserver enumeration visible |
| T1610 | Deploy Container | Partial | Pair with CloudTrail RunTask / EKS audit |
| T1525 | Implant Internal Image | Out of scope | Image-side; pair with ECR scanning |
| T1552.004 | Unsecured Credentials: Private Keys (SA token theft) | Partial | — |
| T1552.007 | Container API | Full | Pod-to-apiserver flows |
| T1021 | Remote Services (pod-to-pod) | Full | Cross-namespace SMB/SSH/RDP-equivalents |
| T1046 | Network Service Discovery | Full | Pod-port-scanning behaviour |
| T1018 | Remote System Discovery | Full | Out-degree explosion |
| T1496 | Resource Hijacking | Full | Cryptojacking pods — see post #8 |
| T1071 | Application Layer Protocol | Full | — |
| T1041 | Exfiltration Over C2 Channel | Full | Pod-as-pivot exfiltration |
Adversary emulation. The open-source Kubernetes attack frameworks framework lets you emulate pod-escape and cross-namespace attacks safely in a lab cluster. open-source Kubernetes attack utilities emulates service-account-token theft. public adversary-emulation atomics also has Kubernetes-specific atomics under the T1611 / T1613 / T1610 series. Run them in a lab namespace and confirm the pipeline scores them.
D3FEND mapping. D3-NTA with Kubernetes scope.
Where This Sits in a Mature Threat Hunting Programme
- Lateral movement graph detection — direct parent technique; this post applies it to Kubernetes.
- FFT C2 beacon detection — pods with periodic egress.
- TLS fingerprinting — sidecar TLS fingerprinting in a service mesh.
- Cryptojacking detection — pods are a common cryptojacking surface.
- Hunting AWS identity attacks — IRSA and pod-level identity intersect here.
Closing Thoughts
The core insight of this post is that Kubernetes network detection is solvable with the same techniques that already work outside Kubernetes — once you stop treating pod IPs as identifiers and start treating deployments as identifiers. The translation table is the missing piece. Build it, join it, and the entire graph-anomaly detection toolkit becomes available on the most ephemeral workloads in your environment. The SOC stops being blind to east-west traffic, and platform engineering stops being the only team that knows what’s happening inside the cluster.
Happy threat hunting.
#threathunting #kubernetes #eks #containersecurity #vpcflowlogs #awssecurity #podsecurity #servicemesh #soc #blueteam #detectionengineering #mitreattack










