Tor and Anonymizer Egress Hunting on VPC Flow Logs

Tor and Anonymizer Egress Hunting on VPC Flow Logs — Exit-node enrichment and multi-hop circuit analysis — HACKFORLAB cover

From the hunt desk. Every blue team I have worked with has, at some point, asked the question: “do we have anyone going out through Tor right now?” The answer is almost always yes, the answer is almost always uninteresting, and the answer is almost always the security engineer running a research project. But every few months it is not — it is a compromised production EC2 instance reaching out through a Tor circuit to a hidden service that the operator uses as a C2 listener, or it is an insider running a Tor browser on a corporate VPN to research something they should not be researching. The 1% you actually care about gets lost in the 99% benign noise unless your detection pipeline knows the difference.

This post is the playbook for hunting Tor and other anonymizer egress (I2P, commercial VPNs, residential-proxy services, Cloudflare WARP) from VPC Flow Logs alone. We cover daily exit-node list ingestion, multi-hop circuit reconstruction signatures, per-identity rarity scoring (so the security engineer doing legitimate research doesn’t alert, but the compromised EC2 instance does), and the cross-correlation with CloudTrail that turns a flow alert into a triagable incident.

It is post #9 in our VPC Flow Log detection-engineering series. Pair it with TLS fingerprinting (Tor browser has a distinctive JA4), DGA + DNS-tunnel hunting (some anonymizers use DNS-over-Tor), and FFT C2 beacon detection for the periodicity sibling.

Why Tor Egress is a Harder Problem Than It Sounds

The naïve detection — “alert when any source talks to any IP on the published Tor exit-node list” — fails for three reasons in production:

The exit-node list changes hourly. A blocklist generated yesterday catches roughly 80% of today’s traffic. Tor’s canonical exit list is the right source; most SOCs hit a stale mirror.
Bridges and pluggable transports bypass the list entirely. Tor bridges are unpublished entry points specifically designed to evade enumeration. obfs4 and meek transports disguise Tor traffic as ordinary HTTPS. Detection has to look at behavioural features, not the destination IP alone.
The noise floor is high. Researchers, security engineers, journalists, executives doing competitor research — many legitimate roles produce occasional Tor traffic. A pipeline that alerts on every Tor connection is silenced within a week.

The pipeline below addresses all three: it ingests the canonical exit-list multiple times per day, layers on behavioural detection for bridge / pluggable-transport bypass, and uses per-identity rarity scoring so the alert volume stays manageable.

The Detection Pipeline

Tor and anonymizer egress detection pipeline — five-step architecture from VPC Flow Logs through IP enrichment to per-identity rarity scoring

Ingest. VPC Flow Logs from every account land in S3 as Parquet. In parallel, a Lambda fetches the Tor exit-node list every 4 hours, the dan.me.uk Tor consolidated list, and the Spamhaus DROP / Stratos / abuse.ch lists for VPN-as-a-service operators. Cached in DynamoDB with a TTL.
IP enrichment. Every external destination IP in the last 24 hours is enriched with: (a) is_tor_exit, (b) is_tor_bridge_candidate (using a separate research list), (c) is_commercial_vpn (Mullvad, NordVPN, ExpressVPN, ProtonVPN, IVPN), (d) is_residential_proxy, (e) ASN classification.
Multi-hop circuit reconstruction. For sources that talk to Tor entries (port 9001, 9030, 9050, 9150, or any IP on the exit list), examine the IAT pattern. Real Tor circuits show distinctive jitter and packet-size patterns because the traffic is wrapped in three layers of TLS — the byte-per-packet distribution is bimodal around the cell size (512 bytes), and average packet size sits around 500–530 bytes regardless of the application payload.
Per-identity rarity scoring. Cross-reference VPC Flow Log source IP with IAM principal from CloudTrail (instance launch metadata). A source IP that has never produced Tor traffic in 30 days gets scored higher than one that produces a daily 200-byte connection that has been there for a year. Identity context turns the alert from “Tor flow” to “compromised production EC2 instance reaching Tor for the first time today.”
Alert + investigative queue. Scored alerts go to the SIEM with identity context attached. Triage is fast because the row contains the IAM role, the instance type, the account, the destination’s ASN, and the behavioural signature class.

Behavioural Signatures of the Major Anonymizers

Tor (plain). Outbound TCP to known exit IPs OR known guard IPs OR ports 9001/9030/9050/9150. Average packet size 500–530 bytes. Sustained connection, hours-long. JA3 / JA4 of Tor browser is distinctive (post #7 of this series).
Tor over obfs4 bridge. Looks like ordinary HTTPS to an unknown destination. Bytes-per-packet distribution still trends toward 512 because of underlying Tor cells. Catch via the size signature plus destination-IP rarity.
I2P. Distinctive UDP traffic to known I2P infrastructure (less rotation than Tor). Less common but the signature is very clean — UDP on ports 12668, 17777, plus high port-diversity on the local side.
Commercial VPN. OpenVPN (UDP/1194 or TCP/443), WireGuard (UDP/51820), IKEv2 (UDP/500 and UDP/4500). Allow-listed if it’s a corporate VPN; alerting if it’s Mullvad / NordVPN / ProtonVPN from a production workload.
Residential proxy. Outbound HTTPS to known residential-proxy provider ASNs. Hard to distinguish from legitimate HTTPS without the ASN reputation feed.
Cloudflare WARP. WireGuard-based, but to Cloudflare’s anycast IPs. Legitimate for personal use; suspicious from a production EC2.

Athena SQL — The Tor Egress Hunt

Assume your IP enrichment table (ip_enrich) is updated daily by the Lambda and has columns ip, is_tor_exit, is_tor_guard, is_commercial_vpn, is_residential_proxy, asn, asn_name.

WITH egress_flows AS (
    SELECT srcaddr, dstaddr, dstport, bytes, packets, start, end
    FROM central_vpc_flow_logs
    WHERE action = 'ACCEPT'
      AND srcaddr LIKE '10.%'
      AND dstaddr NOT LIKE '10.%' AND dstaddr NOT LIKE '172.%' AND dstaddr NOT LIKE '192.168.%'
      AND day BETWEEN '2026/05/09' AND '2026/05/15'
),
enriched AS (
    SELECT e.*, ie.is_tor_exit, ie.is_tor_guard,
           ie.is_commercial_vpn, ie.is_residential_proxy,
           ie.asn_name,
           CAST(e.bytes AS DOUBLE) / NULLIF(e.packets, 0) AS bytes_per_packet
    FROM egress_flows e
    LEFT JOIN ip_enrich ie ON e.dstaddr = ie.ip
),
host_summary AS (
    SELECT srcaddr,
           COUNT(*)                                                            AS total_egress_flows,
           COUNT_IF(is_tor_exit OR is_tor_guard)                                AS tor_flow_count,
           COUNT_IF(is_commercial_vpn)                                          AS commercial_vpn_flow_count,
           COUNT_IF(is_residential_proxy)                                       AS residential_proxy_flow_count,
           COUNT(*) FILTER (WHERE dstport IN (9001, 9030, 9050, 9150))          AS tor_port_count,
           COUNT(*) FILTER (WHERE bytes_per_packet BETWEEN 480 AND 540)         AS tor_size_pattern_count,
           SUM(CASE WHEN is_tor_exit THEN bytes ELSE 0 END)                     AS tor_bytes,
           ARRAY_AGG(DISTINCT asn_name) FILTER (WHERE is_tor_exit OR is_tor_guard) AS tor_asns_touched
    FROM enriched
    GROUP BY srcaddr
)
SELECT *
FROM host_summary
WHERE tor_flow_count > 0
   OR commercial_vpn_flow_count > 0
   OR residential_proxy_flow_count > 0
   OR (tor_port_count > 0 AND tor_size_pattern_count > 50)
ORDER BY tor_bytes DESC;

The next-stage Lambda pulls each source host into a per-identity rarity context using CloudTrail RunInstances metadata + a 30-day historical baseline of (identity, anonymizer-type) pairs. The rarity score is what graduates a row into the alert queue.

Feature Engineering

Feature	Source	Formula / method	What it captures
Anonymizer type flag	IP enrichment	boolean per type	Categorical classification
Bytes-per-packet signature	VPC Flow Logs	distribution centred around 512	Tor cell signature
Destination port pattern	VPC Flow Logs	Tor-known port match	Direct Tor port use
Per-identity rarity	Historical baseline + IAM tag	1 / (count_in_30d_baseline + 1)	Novelty per identity
Workload-type plausibility	EC2 instance tags + CloudTrail	plausibility map per workload class	Research host vs production
Session duration	VPC Flow Logs	end − start	Real Tor sessions are minutes-to-hours
ASN reputation	Threat-intel feed	Spamhaus / Team Cymru lookup	Adds confidence
JA4 fingerprint match	TLS sensor	Tor browser JA4	High-confidence direct match

Where Tor Detection Goes Wrong

Researchers and bug-bounty hunters in your engineering team need to reach Tor for legitimate work. Tag the source IPs in an allow-list maintained by the security team and refresh weekly.
SOC analysts investigating active threats also need Tor — they pivot to hidden services to read ransomware-affiliate leak sites. Same allow-list pattern.
Bring-your-own-VPN in the office. Personal devices on corporate Wi-Fi often have Mullvad / ProtonVPN running. This noise will dominate the queue if the network is not segmented. Restrict the hunt to production VPCs.
Cloudflare WARP. Legitimate for personal users but routinely shows up in dev / CI environments. Treat it like a commercial VPN — alert if it’s from production.
Residential-proxy library use in legitimate SDKs. A handful of marketing SDKs use residential proxies for web-scraping inside legitimate applications. Confirm with the application owner before alerting.

MITRE ATT&CK Techniques Covered

ATT&CK ID	Technique / sub-technique	Coverage	Hunter notes
T1090	Proxy (parent)	Full	Tor is the canonical proxy technique
T1090.003	Proxy: Multi-hop Proxy	Full	Tor specifically
T1090.002	Proxy: External Proxy	Full	Commercial VPN / residential proxy
T1090.004	Proxy: Domain Fronting	Partial	Meek pluggable transport uses domain fronting
T1071.001	Application Layer Protocol: Web Protocols	Partial	obfs4 bridges are HTTPS-like
T1572	Protocol Tunneling	Full	Tor wraps payloads in TLS-over-TLS-over-TLS
T1573	Encrypted Channel	Full	All anonymizer flows are encrypted
T1573.002	Asymmetric Cryptography	Full	—
T1102	Web Service	Partial	Hidden services on Tor
T1568	Dynamic Resolution	Out of scope	Tor doesn’t use DNS for hidden services

Adversary emulation. Stand up a lab EC2 instance, install Tor (apt-get install tor), wait for circuit establishment, browse check.torproject.org over the SOCKS proxy. The pipeline should flag the host within one query interval. For commercial VPN emulation, install Mullvad CLI on a lab instance. Atomic Red Team T1090 includes scripted variants.

Adversary groups. Tor-using adversaries are diffuse; the technique is too common for clean attribution. Persistent users include G0080 — Cobalt Group, several state-aligned actors who use it for OPSEC, and almost every ransomware affiliate during the negotiation phase.

D3FEND mapping. D3-OTF (Outbound Traffic Filtering) at the alert layer; D3-NTA as the umbrella.

Where This Sits in a Mature Threat Hunting Programme

FFT C2 beacon detection — Tor circuits can carry beacons.
TLS fingerprinting — Tor browser JA4 is a strong signal.
DGA + DNS-tunnel hunting.
Cloud cryptojacking detection — Tor egress + mining traffic is the worst-case combination.
LotL Markov kill-chain.

Closing Thoughts

Tor is not a sign of compromise. Tor egress from a production database server at 3am is a sign of compromise. The detection that matters is not “do we see Tor” but “where, when, and which identity.” Build the per-identity rarity layer, accept that the security-engineer false positives will dominate at first, and grow the allow-list intentionally. Within a month the queue is clean and the rare alerts that fire are worth dropping everything to investigate.

Happy threat hunting.

#threathunting #tor #anonymizer #vpcflowlogs #awssecurity #c2 #soc #blueteam #detectionengineering #ueba #mitreattack

Forensics and Cyber Threat Research Area

Why Tor Egress is a Harder Problem Than It Sounds

The Detection Pipeline

Behavioural Signatures of the Major Anonymizers

Athena SQL — The Tor Egress Hunt

Feature Engineering

Where Tor Detection Goes Wrong

MITRE ATT&CK Techniques Covered

Where This Sits in a Mature Threat Hunting Programme

Closing Thoughts

Like this:

Related

Leave a Reply Cancel reply

Indicator of Attacks | Indicator of Compromise

Recent Posts

Hackforlab Category

FaceBook Page

SIEM | UEBA

GridView List Posts Widget

Weekly Threat Advisory: Beyond Ransomware — 11 RATs, 7 APTs, 1 WIPER, HASH Still Leads (Jul 6 – 12, 2026)

Weekly Threat Advisory: 5 APTs, 200 RATs, 74% High-Severity — The Week the C2 Flood Went Quiet (Jun 29 – Jul 5, 2026)

Weekly Threat Advisory: APT Surge, Ransomware Full-Pivot, Messaging Weaponised — June 22-28, 2026

Indicators of Compromise and Threat Intelligence: A Practitioner Reference

Weekly Threat Advisory: Cluster Analysis & Top IOCs, June 15 – 21, 2026

Cyber Threat Attacks / Hunting

Cyber Deception

FOLLOW US

CYBER THREAT CATEGORIES

Top Cyber Security Articles

Threat Hunting Scenarios

Forensics and Cyber Threat Research Area

Why Tor Egress is a Harder Problem Than It Sounds

The Detection Pipeline

Behavioural Signatures of the Major Anonymizers

Athena SQL — The Tor Egress Hunt

Feature Engineering

Where Tor Detection Goes Wrong

MITRE ATT&CK Techniques Covered

Where This Sits in a Mature Threat Hunting Programme

Closing Thoughts

SHARE

Like this:

Related

Related Articles

Leave a Reply Cancel reply

Indicator of Attacks | Indicator of Compromise

Recent Posts

Hackforlab Category

SOCIAL HACKFORLAB

FaceBook Page

SIEM | UEBA

GridView List Posts Widget

Weekly Threat Advisory: Beyond Ransomware — 11 RATs, 7 APTs, 1 WIPER, HASH Still Leads (Jul 6 – 12, 2026)

Weekly Threat Advisory: 5 APTs, 200 RATs, 74% High-Severity — The Week the C2 Flood Went Quiet (Jun 29 – Jul 5, 2026)

Weekly Threat Advisory: APT Surge, Ransomware Full-Pivot, Messaging Weaponised — June 22-28, 2026

Indicators of Compromise and Threat Intelligence: A Practitioner Reference

Weekly Threat Advisory: Cluster Analysis & Top IOCs, June 15 – 21, 2026

Cyber Threat Attacks / Hunting

Cyber Deception