Insider Threat Detection from VPC Flow Logs (UEBA Without Endpoints)

Insider Threat UEBA from VPC Flow Logs — Network-only user behaviour analytics without endpoint telemetry — HACKFORLAB cover

From the hunt desk. The first insider case I worked on was a developer who had given notice two weeks earlier. EDR was clean. CloudTrail looked normal. The HR-side off-boarding workflow was on schedule. What gave it away — eventually — was that the engineer’s bastion host had started reaching production database read-replicas it had no business touching, at hours that did not match any project they were on, with byte volumes that did not match any code-review export they had ever done before. The signal was in VPC Flow Logs. Nobody had a pipeline looking for it.

This post is the playbook for building a User and Entity Behaviour Analytics (UEBA) layer on top of VPC Flow Logs alone, without depending on endpoint agents. The pipeline pivots from host-as-entity to identity-as-entity — every flow is attributed to the IAM principal that owns the source resource, and the analytics layer compares each principal’s current behaviour against three baselines: their own 90-day history, their peer group’s behaviour, and their role’s expected behaviour. The output is a per-identity risk score that surfaces joiner/mover/leaver insider-threat patterns the network-as-network detections in this series will miss.

This is post #11 in our VPC Flow Log detection-engineering series. It is deliberately orthogonal to low-and-slow exfiltration detection — that pipeline catches the technique regardless of who does it; this pipeline catches the person doing anything, regardless of technique. The two pair perfectly.

Why Network-Only UEBA is Worth Building

Most commercial UEBA platforms are endpoint-led. They ingest Windows Security logs, sysmon, EDR telemetry, badge data, HR feeds. They are excellent — and they are expensive, slow to deploy, and entirely blind to anything that happens off-endpoint. Three structural cases break them:

Cloud workloads with no agent. A developer connecting to an EC2 instance via SSM Session Manager from a laptop the EDR can see — the EDR sees the laptop, the SIEM sees the CloudTrail event, but neither sees what the engineer does inside the instance, which is where the actual data access happens.
Contractor and third-party access. Vendors connecting through your VPN or your IdP-federated SSO are not on your EDR fleet. CloudTrail captures their AssumeRole. VPC Flow Logs capture what they do after that. Nothing else does.
Privileged-user data-plane activity. A DBA running a long-running SELECT against production through a bastion host produces zero events anywhere outside VPC Flow Logs. The SELECT does not show up in CloudTrail. The bastion’s auth log shows the login but not the volume of data moved over the encrypted session.

For all three cases, VPC Flow Logs are the only telemetry that captures the data-plane action. Build the UEBA layer on top of them and you cover the cases the endpoint-led platform cannot reach.

Pivoting From Host-as-Entity to Identity-as-Entity

The single biggest design decision: every other post in this series treats the host (source IP) as the unit of analysis. This post does not. Hosts are ephemeral, shared across users, and weakly attributable. Identity is durable, owned, and the actual subject of HR / legal / response actions.

The translation works because every flow in your VPC can be traced back to an identity by following two enrichment hops:

Source IP → instance ID via the VPC Flow Log instance_id field or the ENI tag lookup table.
Instance ID → IAM principal via CloudTrail RunInstances / StartSession / AssumeRole events that started or accessed the instance, plus IRSA bindings for EKS pods.

Bastion hosts and shared servers are the hardest case — multiple users sharing a single source IP. For these, you need session-aware attribution: CloudTrail SSM Session Manager events give you the user → instance binding with timestamps, so a flow from the bastion at t is attributed to the SSM session that was active at t. For SSH-based bastions without SSM, the bastion’s syslog needs to be ingested in parallel to provide the same mapping.

The Three Baselines Behind a UEBA Score

Insider threat UEBA pipeline — five-step architecture from VPC Flow plus IAM tags through peer-group baseline to fusion alert

The detection engine does not just compare identity X today to identity X yesterday. That would catch joiners (no baseline) and miss steady drift over months. It uses three baselines simultaneously:

Self-baseline (90 days). The identity’s own historical behaviour. Captures personal habits — which databases this user normally queries, which hours, which volumes. Drift against this baseline catches the slow-burn insider whose behaviour gradually changes after they have decided to leave.
Peer-group baseline (rolling 30 days). Behaviour of identities with the same role tag (DBA, SRE, ML engineer, BI analyst, contractor). The peer group is defined by IAM role and HR-system role tags. Catches the identity who is doing things technically permitted by their role but unusual relative to their actual peers — the contractor who runs queries no other contractor runs, the SRE who exports volumes no other SRE exports.
Role-expectation baseline (static, policy-defined). What this role should be doing. Defined by the security and data-governance teams. Catches the misuse-of-privilege case where the role’s permissions are too broad and the user is exercising the slack — legal, ethical, sometimes — but visibly different from documented intent.

The fusion logic combines all three. An identity whose behaviour differs from its self-baseline and its peer baseline and its role-expectation baseline is a high-confidence alert. Two of three is investigation queue. One of three is contextual noise.

The Joiner / Mover / Leaver Lifecycle

Every identity in an organisation passes through three states, and each state has a distinct insider-threat signature on the network:

Joiner (day 0 to ~30). No self-baseline yet. Detection lives entirely in the peer-group and role baselines. The high-risk pattern is a joiner who immediately exhibits power-user behaviour — access volumes that match a tenured engineer rather than a new hire.
Mover (role-change events from HR feed). Self-baseline is partly stale. The pipeline issues a “baseline grace period” of 14 days while the identity learns its new role. Outside grace, mover-state misuse looks like residual access to old systems — querying databases relevant to the previous team three weeks after the team change.
Leaver (resignation notice → final day). The single highest-signal state. From the date of notice until separation, every identity’s risk weight is multiplied by a factor (we use 3.0 in production). HR feed integration is mandatory for this signal to fire; without it, leavers look like normal employees up until the day their access is revoked, by which point the exfil has already happened.

Connecting the HR feed to the UEBA pipeline takes one engineering week and produces the single largest gain in insider-threat detection coverage available to any organisation. If you have a modern HR systems API and a willing People-Ops team, do this before anything else.

Feature Engineering — Per-Identity Vector

Feature	Source	Formula / method	What it captures
Daily egress byte total	VPC Flow + IAM mapping	SUM(bytes) per identity per day	Gross data exposure
Internal target diversity	VPC Flow + IAM	distinct internal services touched per day	Lateral exposure
Database connection rate	VPC Flow (DB ports)	flow_count to RDS/Redshift IPs per identity per hour	Query workload signature
Off-hours activity ratio	VPC Flow + identity time-zone	bytes_22h-06h / bytes_total per identity	Working-hours deviation
Weekend activity flag	VPC Flow + calendar	boolean Sat/Sun activity	Schedule deviation
Geographic anomaly	VPC Flow + IAM federation source IP	distance from baseline source location	Travel / impossible-travel
Privilege-escalation flow	VPC Flow (sts/iam endpoints)	flows to AWS STS or IAM endpoints	Privilege-use signature
Bastion-session intensity	SSM Session Manager + VPC Flow	bytes per session, duration	Hands-on-keyboard signal
Peer-group deviation score	peer baseline + identity vector	Mahalanobis distance from peer centroid	Cross-peer abnormality
Role-expectation deviation	policy + identity vector	vector dot-product against role profile	Misuse-of-privilege signal
JML lifecycle multiplier	HR feed	1.0 / 1.5 / 3.0 based on state	Lifecycle-aware weighting
Token-rotation anomaly	CloudTrail GetSessionToken	frequency relative to baseline	Long-lived credential signal

Concept Drift — The Detail That Kills Most UEBA Deployments

Identity behaviour is not stationary. People change projects, learn new tools, get promoted into new domains. A naïve UEBA pipeline alerts on every legitimate role transition and gets disabled by the SOC within six weeks. The fix is to handle concept drift explicitly:

Decay weight on self-baseline. Older days count less. We use a 60-day exponential decay; 30 days out is weighted at roughly 0.5 of yesterday.
Sliding window on peer baseline. The peer group itself drifts as people join and leave teams. Refresh the peer-group membership monthly from HR data.
Role-expectation policy review. Quarterly review with security and data-governance leadership. If the role baseline is wrong, every alert is wrong.
Explicit grace periods. A documented mover event from HR triggers a 14-day grace period where new-target alerts are suppressed.
Feedback loop. Every dismissed alert gets a reason code. After 6 months, the reason-code distribution tells you which features are overfit; retrain dropping them.

Athena SQL — Per-Identity Behavioural Vector

This query is heavier than the others in this series because it joins VPC Flow Logs to a per-flow identity mapping table (flow_identity_map) populated by an upstream Lambda that watches CloudTrail / SSM events.

WITH identity_flows AS (
    SELECT fim.identity_arn, fim.role_tag, f.bytes, f.packets, f.dstaddr, f.dstport,
           f.start, f.end,
           HOUR(from_unixtime(f.start)) AS hour_of_day,
           DAY_OF_WEEK(from_unixtime(f.start)) AS dow
    FROM central_vpc_flow_logs f
    JOIN flow_identity_map fim
      ON f.srcaddr = fim.source_ip
     AND f.start BETWEEN fim.session_start AND fim.session_end
    WHERE f.action = 'ACCEPT'
      AND f.day BETWEEN '2026/05/09' AND '2026/05/15'
),
daily_identity_vector AS (
    SELECT identity_arn, role_tag,
           DATE_FORMAT(from_unixtime(start), '%Y-%m-%d') AS d,
           SUM(bytes)                                                                 AS bytes_total,
           COUNT(DISTINCT dstaddr)                                                    AS distinct_targets,
           COUNT(*) FILTER (WHERE dstport IN (3306,5432,1433,27017,6379,5439))         AS db_flow_count,
           SUM(bytes) FILTER (WHERE hour_of_day BETWEEN 22 AND 23 OR hour_of_day < 6)  AS bytes_offhours,
           SUM(bytes) FILTER (WHERE dow IN (1, 7))                                     AS bytes_weekend,
           COUNT(DISTINCT dstport)                                                    AS port_diversity
    FROM identity_flows
    GROUP BY identity_arn, role_tag, DATE_FORMAT(from_unixtime(start), '%Y-%m-%d')
),
peer_baseline AS (
    SELECT role_tag,
           AVG(bytes_total)      AS peer_avg_bytes, STDDEV(bytes_total)      AS peer_std_bytes,
           AVG(distinct_targets) AS peer_avg_targets, STDDEV(distinct_targets) AS peer_std_targets,
           AVG(db_flow_count)    AS peer_avg_db,    STDDEV(db_flow_count)     AS peer_std_db
    FROM daily_identity_vector
    WHERE d < '2026-05-15'
    GROUP BY role_tag
),
self_baseline AS (
    SELECT identity_arn,
           AVG(bytes_total) AS self_avg_bytes, STDDEV(bytes_total) AS self_std_bytes
    FROM daily_identity_vector
    WHERE d BETWEEN '2026-02-15' AND '2026-05-14'
    GROUP BY identity_arn
)
SELECT v.identity_arn, v.role_tag, v.d, v.bytes_total,
       (v.bytes_total - sb.self_avg_bytes) / NULLIF(sb.self_std_bytes, 0) AS z_self_bytes,
       (v.bytes_total - pb.peer_avg_bytes) / NULLIF(pb.peer_std_bytes, 0) AS z_peer_bytes,
       (v.distinct_targets - pb.peer_avg_targets) / NULLIF(pb.peer_std_targets, 0) AS z_peer_targets,
       (v.db_flow_count - pb.peer_avg_db) / NULLIF(pb.peer_std_db, 0) AS z_peer_db
FROM daily_identity_vector v
LEFT JOIN self_baseline sb ON v.identity_arn = sb.identity_arn
LEFT JOIN peer_baseline pb ON v.role_tag = pb.role_tag
WHERE v.d = '2026-05-15'
  AND ((v.bytes_total - sb.self_avg_bytes) / NULLIF(sb.self_std_bytes, 0) > 2
    OR (v.bytes_total - pb.peer_avg_bytes) / NULLIF(pb.peer_std_bytes, 0) > 2.5)
ORDER BY z_peer_bytes DESC;

The result is your per-identity investigation queue. Each row tells you which identity, what role, how far the identity deviates from itself, and how far the identity deviates from its peers. The cases where both z-scores are elevated are the highest-confidence insider signal.

Specific Insider Patterns to Hunt

The slow-build leaver. Self-baseline drift over 14 days starting from a date close to a known HR notice event. Volume creeps up 5–10% per day rather than spiking; volumetric thresholds never trigger but the trajectory is unmistakable.
The contractor with a side project. Peer-deviation flags a contractor whose internal-target diversity is unusual for their cohort. Often benign (curious learner) but worth a manager conversation in either case.
Privilege misuse. Role-expectation deviation: an SRE running production queries that production analysts run, or a finance analyst hitting infrastructure APIs. The role policy says they can, the role expectation says they don’t.
Account takeover masquerading as insider. Self-deviation extreme; peer-deviation extreme; role-expectation deviation extreme — all three at once on an established identity. This is almost certainly not the legitimate user. Pair with CloudTrail GetSessionToken anomaly and federation source-IP geolocation drift for confirmation.
Off-boarded but not revoked. An identity flagged “leaver” in HR but still producing flows. The revocation workflow has a gap.

MITRE ATT&CK Techniques Covered

ATT&CK ID	Technique / sub-technique	Coverage	Hunter notes
T1078	Valid Accounts	Full	The entire post is the detection layer for this
T1078.004	Valid Accounts: Cloud Accounts	Full	Identity-as-entity is cloud-native
T1530	Data from Cloud Storage	Partial	Network signal to S3 / EBS endpoints visible
T1213	Data from Information Repositories	Partial	Database connection-rate feature
T1213.002	Confluence	Partial	Internal target diversity
T1213.003	Code Repositories	Partial	—
T1098	Account Manipulation	Partial	Token-rotation anomaly
T1098.001	Account Manipulation: Additional Cloud Credentials	Partial	—
T1556	Modify Authentication Process	Out of scope	Auth-side; pair with IdP
T1133	External Remote Services	Partial	VPN / bastion entry detection
T1567	Exfiltration Over Web Service	Partial	Out-of-baseline egress to SaaS
T1041	Exfiltration Over C2 Channel	Out of scope	See post #3 (Isolation Forest + LSTM)

Insider Threat ontology. The CERT Insider Threat Center at CMU maintains the canonical taxonomy (IT sabotage, theft of IP, fraud, unintentional). UEBA networks address theft-of-IP and fraud directly; sabotage and unintentional require additional behavioural signals from endpoint and HR systems.

Adversary emulation. Insider-threat emulation is harder than external — there’s no public adversary-emulation atomics playbook because the technique is “log in legitimately and do your normal job, then a little more.” The cleanest test is to deliberately have a tenured engineer run an out-of-character query session and verify the pipeline scores it elevated within 24 hours.

Where This Sits in a Mature Threat Hunting Programme

Low-and-slow data exfiltration — orthogonal sibling; technique-driven vs identity-driven.
Lateral movement graph detection.
Tor and anonymizer egress — common insider technique.
Hunting AWS identity attacks.
Authentication-event threat hunting.
Kubernetes east-west detection.

Closing Thoughts

Insider threat is the detection category most security programmes underinvest in until the post-mortem after the first incident — at which point everyone realises that the data was there the whole time, nobody was looking, and the signal was perfectly visible in retrospect. Build the per-identity UEBA pipeline on VPC Flow Logs and your blind spot for off-endpoint cloud activity disappears. The first month produces a flood of legitimate-but-unusual behaviour to triage and tune past; by month three the queue runs at single-digit alerts per day and every one of them is worth a manager conversation, an HR review, or — sometimes — an actual incident response.

Happy threat hunting.

#threathunting #insiderthreat #ueba #vpcflowlogs #awssecurity #iam #identity #soc #blueteam #detectionengineering #mitreattack #behavioralanalytics

Forensics and Cyber Threat Research Area

Insider Threat Detection from VPC Flow Logs (UEBA Without Endpoints)

Why Network-Only UEBA is Worth Building

Pivoting From Host-as-Entity to Identity-as-Entity

The Three Baselines Behind a UEBA Score

The Joiner / Mover / Leaver Lifecycle

Feature Engineering — Per-Identity Vector

Concept Drift — The Detail That Kills Most UEBA Deployments

Athena SQL — Per-Identity Behavioural Vector

Specific Insider Patterns to Hunt

MITRE ATT&CK Techniques Covered

Where This Sits in a Mature Threat Hunting Programme

Closing Thoughts

Like this:

Related

Leave a Reply Cancel reply

Indicator of Attacks | Indicator of Compromise

Recent Posts

Hackforlab Category

FaceBook Page

SIEM | UEBA

GridView List Posts Widget

Weekly Threat Advisory: Beyond Ransomware — 11 RATs, 7 APTs, 1 WIPER, HASH Still Leads (Jul 6 – 12, 2026)

Weekly Threat Advisory: 5 APTs, 200 RATs, 74% High-Severity — The Week the C2 Flood Went Quiet (Jun 29 – Jul 5, 2026)

Weekly Threat Advisory: APT Surge, Ransomware Full-Pivot, Messaging Weaponised — June 22-28, 2026

Indicators of Compromise and Threat Intelligence: A Practitioner Reference

Weekly Threat Advisory: Cluster Analysis & Top IOCs, June 15 – 21, 2026

Cyber Threat Attacks / Hunting

Cyber Deception

FOLLOW US

CYBER THREAT CATEGORIES

Top Cyber Security Articles

Threat Hunting Scenarios

Forensics and Cyber Threat Research Area

Why Network-Only UEBA is Worth Building

Pivoting From Host-as-Entity to Identity-as-Entity

The Three Baselines Behind a UEBA Score

The Joiner / Mover / Leaver Lifecycle

Feature Engineering — Per-Identity Vector

Concept Drift — The Detail That Kills Most UEBA Deployments

Athena SQL — Per-Identity Behavioural Vector

Specific Insider Patterns to Hunt

MITRE ATT&CK Techniques Covered

Where This Sits in a Mature Threat Hunting Programme

Closing Thoughts

SHARE

Like this:

Related

Related Articles

Leave a Reply Cancel reply

Indicator of Attacks | Indicator of Compromise

Recent Posts

Hackforlab Category

SOCIAL HACKFORLAB

FaceBook Page

SIEM | UEBA

GridView List Posts Widget

Weekly Threat Advisory: Beyond Ransomware — 11 RATs, 7 APTs, 1 WIPER, HASH Still Leads (Jul 6 – 12, 2026)

Weekly Threat Advisory: 5 APTs, 200 RATs, 74% High-Severity — The Week the C2 Flood Went Quiet (Jun 29 – Jul 5, 2026)

Weekly Threat Advisory: APT Surge, Ransomware Full-Pivot, Messaging Weaponised — June 22-28, 2026

Indicators of Compromise and Threat Intelligence: A Practitioner Reference

Weekly Threat Advisory: Cluster Analysis & Top IOCs, June 15 – 21, 2026

Cyber Threat Attacks / Hunting

Cyber Deception