
From the hunt desk. Every blue team I have worked with has, at some point, asked the question: “do we have anyone going out through Tor right now?” The answer is almost always yes, the answer is almost always uninteresting, and the answer is almost always the security engineer running a research project. But every few months it is not — it is a compromised production EC2 instance reaching out through a Tor circuit to a hidden service that the operator uses as a C2 listener, or it is an insider running a Tor browser on a corporate VPN to research something they should not be researching. The 1% you actually care about gets lost in the 99% benign noise unless your detection pipeline knows the difference.
This post is the playbook for hunting Tor and other anonymizer egress (I2P, commercial VPNs, residential-proxy services, Cloudflare WARP) from VPC Flow Logs alone. We cover daily exit-node list ingestion, multi-hop circuit reconstruction signatures, per-identity rarity scoring (so the security engineer doing legitimate research doesn’t alert, but the compromised EC2 instance does), and the cross-correlation with CloudTrail that turns a flow alert into a triagable incident.
It is post #9 in our VPC Flow Log detection-engineering series. Pair it with TLS fingerprinting (Tor browser has a distinctive JA4), DGA + DNS-tunnel hunting (some anonymizers use DNS-over-Tor), and FFT C2 beacon detection for the periodicity sibling.
Why Tor Egress is a Harder Problem Than It Sounds
The naïve detection — “alert when any source talks to any IP on the published Tor exit-node list” — fails for three reasons in production:
- The exit-node list changes hourly. A blocklist generated yesterday catches roughly 80% of today’s traffic. Tor’s canonical exit list is the right source; most SOCs hit a stale mirror.
- Bridges and pluggable transports bypass the list entirely. Tor bridges are unpublished entry points specifically designed to evade enumeration. obfs4 and meek transports disguise Tor traffic as ordinary HTTPS. Detection has to look at behavioural features, not the destination IP alone.
- The noise floor is high. Researchers, security engineers, journalists, executives doing competitor research — many legitimate roles produce occasional Tor traffic. A pipeline that alerts on every Tor connection is silenced within a week.
The pipeline below addresses all three: it ingests the canonical exit-list multiple times per day, layers on behavioural detection for bridge / pluggable-transport bypass, and uses per-identity rarity scoring so the alert volume stays manageable.
The Detection Pipeline

- Ingest. VPC Flow Logs from every account land in S3 as Parquet. In parallel, a Lambda fetches the Tor exit-node list every 4 hours, the dan.me.uk Tor consolidated list, and the Spamhaus DROP / Stratos / abuse.ch lists for VPN-as-a-service operators. Cached in DynamoDB with a TTL.
- IP enrichment. Every external destination IP in the last 24 hours is enriched with: (a) is_tor_exit, (b) is_tor_bridge_candidate (using a separate research list), (c) is_commercial_vpn (Mullvad, NordVPN, ExpressVPN, ProtonVPN, IVPN), (d) is_residential_proxy, (e) ASN classification.
- Multi-hop circuit reconstruction. For sources that talk to Tor entries (port 9001, 9030, 9050, 9150, or any IP on the exit list), examine the IAT pattern. Real Tor circuits show distinctive jitter and packet-size patterns because the traffic is wrapped in three layers of TLS — the byte-per-packet distribution is bimodal around the cell size (512 bytes), and average packet size sits around 500–530 bytes regardless of the application payload.
- Per-identity rarity scoring. Cross-reference VPC Flow Log source IP with IAM principal from CloudTrail (instance launch metadata). A source IP that has never produced Tor traffic in 30 days gets scored higher than one that produces a daily 200-byte connection that has been there for a year. Identity context turns the alert from “Tor flow” to “compromised production EC2 instance reaching Tor for the first time today.”
- Alert + investigative queue. Scored alerts go to the SIEM with identity context attached. Triage is fast because the row contains the IAM role, the instance type, the account, the destination’s ASN, and the behavioural signature class.
Behavioural Signatures of the Major Anonymizers
- Tor (plain). Outbound TCP to known exit IPs OR known guard IPs OR ports 9001/9030/9050/9150. Average packet size 500–530 bytes. Sustained connection, hours-long. JA3 / JA4 of Tor browser is distinctive (post #7 of this series).
- Tor over obfs4 bridge. Looks like ordinary HTTPS to an unknown destination. Bytes-per-packet distribution still trends toward 512 because of underlying Tor cells. Catch via the size signature plus destination-IP rarity.
- I2P. Distinctive UDP traffic to known I2P infrastructure (less rotation than Tor). Less common but the signature is very clean — UDP on ports 12668, 17777, plus high port-diversity on the local side.
- Commercial VPN. OpenVPN (UDP/1194 or TCP/443), WireGuard (UDP/51820), IKEv2 (UDP/500 and UDP/4500). Allow-listed if it’s a corporate VPN; alerting if it’s Mullvad / NordVPN / ProtonVPN from a production workload.
- Residential proxy. Outbound HTTPS to known residential-proxy provider ASNs. Hard to distinguish from legitimate HTTPS without the ASN reputation feed.
- Cloudflare WARP. WireGuard-based, but to Cloudflare’s anycast IPs. Legitimate for personal use; suspicious from a production EC2.
Athena SQL — The Tor Egress Hunt
Assume your IP enrichment table (ip_enrich) is updated daily by the Lambda and has columns ip, is_tor_exit, is_tor_guard, is_commercial_vpn, is_residential_proxy, asn, asn_name.
WITH egress_flows AS (
SELECT srcaddr, dstaddr, dstport, bytes, packets, start, end
FROM central_vpc_flow_logs
WHERE action = 'ACCEPT'
AND srcaddr LIKE '10.%'
AND dstaddr NOT LIKE '10.%' AND dstaddr NOT LIKE '172.%' AND dstaddr NOT LIKE '192.168.%'
AND day BETWEEN '2026/05/09' AND '2026/05/15'
),
enriched AS (
SELECT e.*, ie.is_tor_exit, ie.is_tor_guard,
ie.is_commercial_vpn, ie.is_residential_proxy,
ie.asn_name,
CAST(e.bytes AS DOUBLE) / NULLIF(e.packets, 0) AS bytes_per_packet
FROM egress_flows e
LEFT JOIN ip_enrich ie ON e.dstaddr = ie.ip
),
host_summary AS (
SELECT srcaddr,
COUNT(*) AS total_egress_flows,
COUNT_IF(is_tor_exit OR is_tor_guard) AS tor_flow_count,
COUNT_IF(is_commercial_vpn) AS commercial_vpn_flow_count,
COUNT_IF(is_residential_proxy) AS residential_proxy_flow_count,
COUNT(*) FILTER (WHERE dstport IN (9001, 9030, 9050, 9150)) AS tor_port_count,
COUNT(*) FILTER (WHERE bytes_per_packet BETWEEN 480 AND 540) AS tor_size_pattern_count,
SUM(CASE WHEN is_tor_exit THEN bytes ELSE 0 END) AS tor_bytes,
ARRAY_AGG(DISTINCT asn_name) FILTER (WHERE is_tor_exit OR is_tor_guard) AS tor_asns_touched
FROM enriched
GROUP BY srcaddr
)
SELECT *
FROM host_summary
WHERE tor_flow_count > 0
OR commercial_vpn_flow_count > 0
OR residential_proxy_flow_count > 0
OR (tor_port_count > 0 AND tor_size_pattern_count > 50)
ORDER BY tor_bytes DESC;
The next-stage Lambda pulls each source host into a per-identity rarity context using CloudTrail RunInstances metadata + a 30-day historical baseline of (identity, anonymizer-type) pairs. The rarity score is what graduates a row into the alert queue.
Feature Engineering
| Feature | Source | Formula / method | What it captures |
|---|---|---|---|
| Anonymizer type flag | IP enrichment | boolean per type | Categorical classification |
| Bytes-per-packet signature | VPC Flow Logs | distribution centred around 512 | Tor cell signature |
| Destination port pattern | VPC Flow Logs | Tor-known port match | Direct Tor port use |
| Per-identity rarity | Historical baseline + IAM tag | 1 / (count_in_30d_baseline + 1) | Novelty per identity |
| Workload-type plausibility | EC2 instance tags + CloudTrail | plausibility map per workload class | Research host vs production |
| Session duration | VPC Flow Logs | end − start | Real Tor sessions are minutes-to-hours |
| ASN reputation | Threat-intel feed | Spamhaus / Team Cymru lookup | Adds confidence |
| JA4 fingerprint match | TLS sensor | Tor browser JA4 | High-confidence direct match |
Where Tor Detection Goes Wrong
- Researchers and bug-bounty hunters in your engineering team need to reach Tor for legitimate work. Tag the source IPs in an allow-list maintained by the security team and refresh weekly.
- SOC analysts investigating active threats also need Tor — they pivot to hidden services to read ransomware-affiliate leak sites. Same allow-list pattern.
- Bring-your-own-VPN in the office. Personal devices on corporate Wi-Fi often have Mullvad / ProtonVPN running. This noise will dominate the queue if the network is not segmented. Restrict the hunt to production VPCs.
- Cloudflare WARP. Legitimate for personal users but routinely shows up in dev / CI environments. Treat it like a commercial VPN — alert if it’s from production.
- Residential-proxy library use in legitimate SDKs. A handful of marketing SDKs use residential proxies for web-scraping inside legitimate applications. Confirm with the application owner before alerting.
MITRE ATT&CK Techniques Covered
| ATT&CK ID | Technique / sub-technique | Coverage | Hunter notes |
|---|---|---|---|
| T1090 | Proxy (parent) | Full | Tor is the canonical proxy technique |
| T1090.003 | Proxy: Multi-hop Proxy | Full | Tor specifically |
| T1090.002 | Proxy: External Proxy | Full | Commercial VPN / residential proxy |
| T1090.004 | Proxy: Domain Fronting | Partial | Meek pluggable transport uses domain fronting |
| T1071.001 | Application Layer Protocol: Web Protocols | Partial | obfs4 bridges are HTTPS-like |
| T1572 | Protocol Tunneling | Full | Tor wraps payloads in TLS-over-TLS-over-TLS |
| T1573 | Encrypted Channel | Full | All anonymizer flows are encrypted |
| T1573.002 | Asymmetric Cryptography | Full | — |
| T1102 | Web Service | Partial | Hidden services on Tor |
| T1568 | Dynamic Resolution | Out of scope | Tor doesn’t use DNS for hidden services |
Adversary emulation. Stand up a lab EC2 instance, install Tor (apt-get install tor), wait for circuit establishment, browse check.torproject.org over the SOCKS proxy. The pipeline should flag the host within one query interval. For commercial VPN emulation, install Mullvad CLI on a lab instance. Atomic Red Team T1090 includes scripted variants.
Adversary groups. Tor-using adversaries are diffuse; the technique is too common for clean attribution. Persistent users include G0080 — Cobalt Group, several state-aligned actors who use it for OPSEC, and almost every ransomware affiliate during the negotiation phase.
D3FEND mapping. D3-OTF (Outbound Traffic Filtering) at the alert layer; D3-NTA as the umbrella.
Where This Sits in a Mature Threat Hunting Programme
- FFT C2 beacon detection — Tor circuits can carry beacons.
- TLS fingerprinting — Tor browser JA4 is a strong signal.
- DGA + DNS-tunnel hunting.
- Cloud cryptojacking detection — Tor egress + mining traffic is the worst-case combination.
- LotL Markov kill-chain.
Closing Thoughts
Tor is not a sign of compromise. Tor egress from a production database server at 3am is a sign of compromise. The detection that matters is not “do we see Tor” but “where, when, and which identity.” Build the per-identity rarity layer, accept that the security-engineer false positives will dominate at first, and grow the allow-list intentionally. Within a month the queue is clean and the rare alerts that fire are worth dropping everything to investigate.
Happy threat hunting.
#threathunting #tor #anonymizer #vpcflowlogs #awssecurity #c2 #soc #blueteam #detectionengineering #ueba #mitreattack










