Athena and S3 Data Lake Exfiltration — hunting the SQL-powered data heist · HackForLab AWS Threat Hunting Part 6

Athena and S3 Data Lake Exfiltration: Hunting the SQL-Powered Data Heist

AWS THREAT HUNTING · PART 06 OF 07 · 2026

If your most valuable data lives in S3 and your Athena workspace is broadly permissioned, an adversary with the right IAM role can exfiltrate the entire data lake with a few SELECT statements — and most cloud SOCs will never see it happen.

Data lakes scale to petabytes. Athena scales to query petabytes. When an adversary compromises an identity with Athena execute privileges, they get a query engine that can read everything the identity has access to and ship the results to a destination they choose. This is data exfiltration at the speed of SQL.

Part 6 of the AWS Threat Hunting series catalogues the three Athena exfiltration patterns observed in cloud IR and ships the detection logic that surfaces them at scale.

Athena and S3 Data Lake Exfiltration — hunting the SQL-powered data heist · HackForLab AWS Threat Hunting Part 6
OPERATOR-GRADE THREAT INTELLIGENCE

HuntIntel ships continuously refreshed adversary cluster attribution and MITRE technique mappings — the data that turns a static hunt into a living one. Stop hunting yesterday’s IOCs. Hunt today’s techniques.

Open HuntIntel →

01 · Why this hunt matters

Three structural factors make Athena exfiltration uniquely attractive. First, Athena queries can read across many S3 buckets in a single statement — the data discovery and the data exfiltration are the same operation. Second, the query results land in a configurable S3 location that the adversary can point at a bucket they control. Third, Athena query history records the queries but not always the result destination, depending on configuration. Many SOCs do not centralise Athena query telemetry.


02 · Adversary tradecraft

Pattern 01 — Large-scan query to attacker-controlled result bucket

The attacker issues an Athena query (SELECT * with broad predicates) targeting high-value tables. They set the workgroup result location to a bucket in their account or to a bucket in the victim account they can read. The query result is the exfiltrated data.

Pattern 02 — Glue catalog enumeration

Before exfiltration, the attacker enumerates the Glue Data Catalog to identify tables and columns. The enumeration pattern itself — many GetTable, GetDatabase, GetPartitions calls in a short window — is detectable.

Pattern 03 — S3 Select for selective exfiltration

For more targeted theft, the attacker uses S3 Select directly against objects, bypassing Athena. S3 Select operations have CloudTrail data events when enabled and produce no result-location signal.

03 · Telemetry needed

  • Athena query execution logs — capture queries and result locations.
  • Glue catalog API events in CloudTrail.
  • S3 data events on buckets storing data-lake content and on result buckets.
  • Glue access via Lake Formation grants if Lake Formation governs your data lake.

OPERATOR CONSOLE · LIVE INTELLIGENCE

Run this hunt against real adversary intelligence.

HuntIntel exposes every catalogued IOC with provenance, confidence, MITRE technique, and adversary cluster pre-mapped. Export Sigma in two clicks, push to your SIEM, ship coverage in minutes.

Sign in to HuntIntel →

04 · Hunt queries

Hunt query 01 — Athena queries with non-baseline result destination

SELECT query_id, principal, query_text, result_location, data_scanned_bytes
FROM athena_query_history
WHERE start_time BETWEEN '2026-06-08' AND '2026-06-14'
  AND data_scanned_bytes > 10737418240   -- > 10 GB
  AND result_location NOT LIKE 's3://approved-results-bucket/%';

Hunt query 02 — Glue catalog enumeration burst

SELECT userIdentity.arn AS principal, COUNT(*) AS calls,
       MIN(eventTime) AS first_call, MAX(eventTime) AS last_call
FROM cloudtrail_logs
WHERE eventSource = 'glue.amazonaws.com'
  AND eventName IN ('GetDatabase', 'GetTable', 'GetTables', 'GetPartitions', 'SearchTables')
  AND eventTime BETWEEN '2026-06-08' AND '2026-06-14'
GROUP BY userIdentity.arn
HAVING COUNT(*) > 50
   AND (CAST(MAX(eventTime) AS TIMESTAMP) - CAST(MIN(eventTime) AS TIMESTAMP)) < INTERVAL '15 minutes';

Hunt query 03 — S3 Select on data-lake buckets from non-baseline principal

SELECT eventTime, userIdentity.arn AS principal, sourceIPAddress,
       requestParameters.bucketName AS bucket,
       requestParameters.key AS object
FROM cloudtrail_data_events
WHERE eventName = 'SelectObjectContent'
  AND eventTime BETWEEN '2026-06-08' AND '2026-06-14'
  AND userIdentity.arn NOT IN (SELECT principal FROM datalake_baseline_consumers);

05 · Sigma rule

title: Athena Large-Scan Query to Non-Baseline Result Location
id: a07b8c99-bc2d-4e34-cf3a-4b5c6d7e8f9a
status: experimental
description: |
  Detects Athena query scanning more than 10 GB whose result location
  is outside the organisation's approved result buckets — surfaces
  data-lake exfiltration via Athena.
author: HackForLab
date: 2026/06/16
references:
  - https://hackforlab.com/aws-athena-s3-data-lake-exfiltration-hunt-2026/
tags:
  - attack.exfiltration
  - attack.t1530
  - attack.t1567.002
logsource:
  product: aws
  service: athena
detection:
  selection:
    data_scanned_bytes|gt: 10737418240
  filter_approved:
    result_location|expand: '%APPROVED_ATHENA_RESULT_BUCKETS%'
  condition: selection and not filter_approved
fields:
  - principal
  - query_text
  - result_location
  - data_scanned_bytes
falsepositives:
  - Approved analyst ad-hoc queries (allowlist by principal)
level: high

06 · Ship as a production detection

The Athena query history is consumable through the Athena API. Stream it to your SIEM. Map detections to T1530 (Data from Cloud Storage Object) and T1567.002 (Exfiltration to Cloud Storage). — or pull pre-mapped clusters from HuntIntel

07 · False-positive considerations

Analyst ad-hoc queries are the dominant FP source. Mitigate with a clear approved-result-bucket list and a principal allowlist for known analysts. Educate analysts not to point Athena results outside the approved buckets.

08 · Response actions

Response: revoke session credentials of the principal; pull the query text to understand what data was queried; check the result bucket for the actual exfiltrated content; rotate any credentials that may have been exposed in the queried tables; audit the principal’s other recent activity for additional indicators. — Sign in to huntintel.hackforlab.com to pull the live catalogue and pivot on the cluster directly.

09 · FAQ

Should we lock down Athena result locations?

Yes — set workgroup-level enforced result locations that cannot be overridden by query-level settings.

What about Lake Formation?

Lake Formation adds a row-level and column-level permission layer that is worth deploying for sensitive data lakes. It is an additional control, not a replacement for Athena query monitoring.

How do we monitor S3 Select?

Enable data events on the buckets storing sensitive data. S3 Select operations produce SelectObjectContent events when data events are on.

Is large-scan a good threshold?

10 GB is a reasonable starting threshold for organisations whose typical analyst queries scan less. Tune for your environment.

Can adversaries use Athena federated queries to reach outside data?

Yes — federated queries reach across data sources. Audit federated data source registrations and treat them as part of your data perimeter.

FROM HUNT TO PRODUCTION DETECTION
Ship every hunt as code. Track every coverage gap.

HuntIntel turns adversary intelligence into hunt-ready queries and production detection rules — without the spreadsheet engineering. Run the hunt. Ship the rule. Track the coverage.

Launch HuntIntel →

Core Working Areas :- Threat Intelligence, Digital Forensics, Incident Response, Fraud Investigation, Web Application Security Technical Certifications :- Computer Hacking Forensics Investigator | Certified Ethical Hacker | Certified Cyber crime investigator | Certified Professional Hacker | Certified Professional Forensics Analyst | Redhat certified Engineer | Cisco Certified Network Associates | Certified Firewall Solutions | Certified Network Monitoring Solution | Certified Proxy Solutions