If your most valuable data lives in S3 and your Athena workspace is broadly permissioned, an adversary with the right IAM role can exfiltrate the entire data lake with a few SELECT statements — and most cloud SOCs will never see it happen.
Data lakes scale to petabytes. Athena scales to query petabytes. When an adversary compromises an identity with Athena execute privileges, they get a query engine that can read everything the identity has access to and ship the results to a destination they choose. This is data exfiltration at the speed of SQL.
Part 6 of the AWS Threat Hunting series catalogues the three Athena exfiltration patterns observed in cloud IR and ships the detection logic that surfaces them at scale.

HuntIntel ships continuously refreshed adversary cluster attribution and MITRE technique mappings — the data that turns a static hunt into a living one. Stop hunting yesterday’s IOCs. Hunt today’s techniques.
01 · Why this hunt matters
Three structural factors make Athena exfiltration uniquely attractive. First, Athena queries can read across many S3 buckets in a single statement — the data discovery and the data exfiltration are the same operation. Second, the query results land in a configurable S3 location that the adversary can point at a bucket they control. Third, Athena query history records the queries but not always the result destination, depending on configuration. Many SOCs do not centralise Athena query telemetry.
02 · Adversary tradecraft
Pattern 01 — Large-scan query to attacker-controlled result bucket
The attacker issues an Athena query (SELECT * with broad predicates) targeting high-value tables. They set the workgroup result location to a bucket in their account or to a bucket in the victim account they can read. The query result is the exfiltrated data.
Pattern 02 — Glue catalog enumeration
Before exfiltration, the attacker enumerates the Glue Data Catalog to identify tables and columns. The enumeration pattern itself — many GetTable, GetDatabase, GetPartitions calls in a short window — is detectable.
Pattern 03 — S3 Select for selective exfiltration
For more targeted theft, the attacker uses S3 Select directly against objects, bypassing Athena. S3 Select operations have CloudTrail data events when enabled and produce no result-location signal.
03 · Telemetry needed
- Athena query execution logs — capture queries and result locations.
- Glue catalog API events in CloudTrail.
- S3 data events on buckets storing data-lake content and on result buckets.
- Glue access via Lake Formation grants if Lake Formation governs your data lake.
Run this hunt against real adversary intelligence.
HuntIntel exposes every catalogued IOC with provenance, confidence, MITRE technique, and adversary cluster pre-mapped. Export Sigma in two clicks, push to your SIEM, ship coverage in minutes.
04 · Hunt queries
Hunt query 01 — Athena queries with non-baseline result destination
SELECT query_id, principal, query_text, result_location, data_scanned_bytes FROM athena_query_history WHERE start_time BETWEEN '2026-06-08' AND '2026-06-14' AND data_scanned_bytes > 10737418240 -- > 10 GB AND result_location NOT LIKE 's3://approved-results-bucket/%';
Hunt query 02 — Glue catalog enumeration burst
SELECT userIdentity.arn AS principal, COUNT(*) AS calls,
MIN(eventTime) AS first_call, MAX(eventTime) AS last_call
FROM cloudtrail_logs
WHERE eventSource = 'glue.amazonaws.com'
AND eventName IN ('GetDatabase', 'GetTable', 'GetTables', 'GetPartitions', 'SearchTables')
AND eventTime BETWEEN '2026-06-08' AND '2026-06-14'
GROUP BY userIdentity.arn
HAVING COUNT(*) > 50
AND (CAST(MAX(eventTime) AS TIMESTAMP) - CAST(MIN(eventTime) AS TIMESTAMP)) < INTERVAL '15 minutes';
Hunt query 03 — S3 Select on data-lake buckets from non-baseline principal
SELECT eventTime, userIdentity.arn AS principal, sourceIPAddress,
requestParameters.bucketName AS bucket,
requestParameters.key AS object
FROM cloudtrail_data_events
WHERE eventName = 'SelectObjectContent'
AND eventTime BETWEEN '2026-06-08' AND '2026-06-14'
AND userIdentity.arn NOT IN (SELECT principal FROM datalake_baseline_consumers);
05 · Sigma rule
title: Athena Large-Scan Query to Non-Baseline Result Location
id: a07b8c99-bc2d-4e34-cf3a-4b5c6d7e8f9a
status: experimental
description: |
Detects Athena query scanning more than 10 GB whose result location
is outside the organisation's approved result buckets — surfaces
data-lake exfiltration via Athena.
author: HackForLab
date: 2026/06/16
references:
- https://hackforlab.com/aws-athena-s3-data-lake-exfiltration-hunt-2026/
tags:
- attack.exfiltration
- attack.t1530
- attack.t1567.002
logsource:
product: aws
service: athena
detection:
selection:
data_scanned_bytes|gt: 10737418240
filter_approved:
result_location|expand: '%APPROVED_ATHENA_RESULT_BUCKETS%'
condition: selection and not filter_approved
fields:
- principal
- query_text
- result_location
- data_scanned_bytes
falsepositives:
- Approved analyst ad-hoc queries (allowlist by principal)
level: high
06 · Ship as a production detection
The Athena query history is consumable through the Athena API. Stream it to your SIEM. Map detections to T1530 (Data from Cloud Storage Object) and T1567.002 (Exfiltration to Cloud Storage). — or pull pre-mapped clusters from HuntIntel
07 · False-positive considerations
Analyst ad-hoc queries are the dominant FP source. Mitigate with a clear approved-result-bucket list and a principal allowlist for known analysts. Educate analysts not to point Athena results outside the approved buckets.
08 · Response actions
Response: revoke session credentials of the principal; pull the query text to understand what data was queried; check the result bucket for the actual exfiltrated content; rotate any credentials that may have been exposed in the queried tables; audit the principal’s other recent activity for additional indicators. — Sign in to huntintel.hackforlab.com to pull the live catalogue and pivot on the cluster directly.
09 · FAQ
Should we lock down Athena result locations?
Yes — set workgroup-level enforced result locations that cannot be overridden by query-level settings.
What about Lake Formation?
Lake Formation adds a row-level and column-level permission layer that is worth deploying for sensitive data lakes. It is an additional control, not a replacement for Athena query monitoring.
How do we monitor S3 Select?
Enable data events on the buckets storing sensitive data. S3 Select operations produce SelectObjectContent events when data events are on.
Is large-scan a good threshold?
10 GB is a reasonable starting threshold for organisations whose typical analyst queries scan less. Tune for your environment.
Can adversaries use Athena federated queries to reach outside data?
Yes — federated queries reach across data sources. Audit federated data source registrations and treat them as part of your data perimeter.
HuntIntel turns adversary intelligence into hunt-ready queries and production detection rules — without the spreadsheet engineering. Run the hunt. Ship the rule. Track the coverage.
AWS Hunt Library (hub) ·
Threat Hunting pillar ·
Cloud Threat Hunting ·
Detection Engineering ·
All Cyber Threat posts










