A Practical Detection Engineering Framework Used by Modern SOCs

DETECTION ENGINEERING · CORNERSTONE GUIDE

The difference between an alert farm and a detection engineering practice is not better tooling — it is a disciplined, repeatable workflow. This article is the workflow. Five stages. One worked example. A rule template you can fork today.

ON THIS PAGE

01 · Why a framework
02 · The five stages
03 · Worked example
04 · Rule template
05 · Failure analysis
06 · SOC vs DE
07 · FAQ

OPERATOR-GRADE PLATFORM

HuntIntel ships the rule lifecycle — hypothesis tracking, validation harness, MITRE coverage map, and TP/FP telemetry — out of the box. No spreadsheet engineering required.

Open the Platform →

Why detection engineering needs a framework

Most security operations centers ship detections the way a startup ships features in week one: someone writes a SPL query at 11pm, drops it into production, and waits. Six months later the rule fires 800 times a day, the on-call engineer has muted the channel, and an incident slips past because the noisy precursor was buried in the alert pile.

A detection engineering framework is the antidote. It is not a tool, it is not a vendor product, and it is not a SIEM. It is the operational discipline that takes a hypothesis about adversary behavior and walks it through five accountable stages until it either earns a place in production or earns retirement. Modern detection engineering teams treat every rule like a piece of software — versioned, tested, instrumented, and eventually deprecated.

This article codifies the framework we use across customer engagements and the one baked into the HuntIntel platform. If you read nothing else, read the five-stage diagram below and the worked example. Everything else is scaffolding.

The framework: a five-stage workflow

Stage 01 · Hypothesis

Every detection starts as a sentence: “An adversary with foothold X will attempt behavior Y, which will leave evidence Z.” Anchor the hypothesis to a MITRE ATT&CK technique ID. If you cannot, the hypothesis is not specific enough — refine before proceeding. Hypotheses are cheap; bad hypotheses ship expensive detections.

Stage 02 · Data Inventory

Before you write a single line of logic, prove the data exists. What log source carries the evidence? Which fields are populated reliably? What is the retention window? What is the typical ingest delay? This stage produces an honest answer to one question: can we actually see this? If the answer is no, the output is a data-gap ticket, not a half-blind detection.

Stage 03 · Logic

Now write the rule. Sigma for portability, KQL or SPL for native execution, SQL against your warehouse, or a model if the behavior is genuinely statistical. Resist the temptation to start here. Every rule born in Stage 03 without Stages 01 and 02 is a rule with no audit trail and no theory of why it should work.

Stage 04 · Validation

A detection that has never fired on a true positive is a wish. Use Atomic Red Team, Stratus Red Team, CALDERA, or hand-rolled replay PCAPs to generate the behavior and prove the rule catches it. Run validation in a non-production lane that mirrors production data shape. The output is a signed receipt: this rule, against this technique, produced this alert at this timestamp.

Stage 05 · Metrics & Lifecycle

Production is not the finish line, it is the start of measurement. Track true-positive rate, false-positive rate, mean time to detect, alert volume, analyst dwell time, and MITRE coverage delta. Every rule has a retirement trigger documented at birth: a date, a TP threshold, a superseding control. Rules without retirement triggers become technical debt.

DETECTION-AS-CODE

Treat detections like software. Version, test, ship, retire.

See it live →

Step-by-step: detecting a cross-account assume-role anomaly

Theory is cheap. Walk through the framework with one concrete example: detecting a suspicious AWS sts:AssumeRole from an external account — a behavior aligned with MITRE T1078.004 (Valid Accounts: Cloud Accounts) and frequently observed in supply-chain and contractor-compromise scenarios.

Stage 01 in practice · Writing the hypothesis

The threat we care about: an attacker who has compromised credentials in an external AWS account will assume a role in our environment from an account ID we have never trusted before. The MITRE anchor is T1078.004. The expected evidence is a sts:AssumeRole CloudTrail event whose source account ID does not appear in our 90-day baseline of trusted principals.

Stage 02 in practice · Auditing the data

CloudTrail management events are the canonical source. We need eventName=AssumeRole, userIdentity.accountId (source), resources[0].accountId (target), sourceIPAddress, and userAgent. Retention is 90 days in CloudWatch Logs, 400 days in our data lake. Ingest delay is under 5 minutes. Coverage is complete — this hypothesis is data-supported.

Stage 03 in practice · The rule

Below is the rule logic expressed as a parameterised SQL query against the CloudTrail lake table. Anonymised account IDs only.

-- detection: aws_sts_cross_account_assumerole_anomaly
-- mitre: T1078.004
-- data:  cloudtrail.management_events

WITH baseline_trusts AS (
  SELECT DISTINCT userIdentity_accountId AS source_acct
  FROM   cloudtrail.management_events
  WHERE  eventName = 'AssumeRole'
    AND  eventTime BETWEEN current_timestamp - INTERVAL '90' DAY
                       AND current_timestamp - INTERVAL '1'  DAY
    AND  errorCode IS NULL
),
candidate_events AS (
  SELECT eventTime,
         userIdentity_accountId AS source_acct,
         resources[1].accountId  AS target_acct,
         resources[1].ARN        AS assumed_role_arn,
         sourceIPAddress,
         userAgent,
         requestParameters_roleSessionName AS session_name
  FROM   cloudtrail.management_events
  WHERE  eventName = 'AssumeRole'
    AND  eventTime >= current_timestamp - INTERVAL '1' HOUR
    AND  errorCode IS NULL
)
SELECT *
FROM   candidate_events c
WHERE  c.source_acct NOT IN (SELECT source_acct FROM baseline_trusts)
  AND  c.source_acct NOT IN ('111111111111','222222222222') -- known partner allowlist
  AND  c.userAgent  NOT LIKE 'aws-internal/%';

Stage 04 in practice · Validating it fires

Spin up a sandbox AWS account that has never appeared in production logs. Use Stratus Red Team’s aws.iam.backdoor-role technique to create a trust relationship and assume the role from the sandbox. Confirm the candidate event appears in the lake within the ingest window and the rule fires. Capture the alert ID, the event timestamp, and the latency. File the receipt in the rule’s validation log.

Stage 05 in practice · Measuring it in the wild

Week one: fourteen alerts. Eleven trace to a CI/CD pipeline whose external automation account joined two days before the rule shipped — outside the 90-day baseline. Add it to the allowlist. Two trace to a backup vendor onboarding the same week. Allowlist. One is a true positive: a contractor whose laptop was compromised, the attacker pivoted into an unused AWS sandbox they had access to, and tried to assume a production role. MTTD: 11 minutes. Rule earns its keep.

A rule template you can fork today

Every detection in a mature program ships with the same metadata. Below is the YAML template we use internally and the one our platform schema enforces at commit time. Names, fields, and structure should look boring — that is the point. Boring scales.

# detection: aws_sts_cross_account_assumerole_anomaly
# version: 1.0.0
# status:  production
name: AWS STS Cross-Account AssumeRole — Baseline Anomaly
id:   det-aws-sts-001
owner: [email protected]
severity: high
hypothesis: >
  An adversary with credentials in an external AWS account will
  attempt sts:AssumeRole into our environment from an account ID
  outside our 90-day baseline of trusted principals.

mitre:
  - tactic:    TA0001     # Initial Access
    technique: T1078.004  # Valid Accounts: Cloud Accounts
  - tactic:    TA0008     # Lateral Movement
    technique: T1550.001  # Application Access Token

data_sources:
  - name:      cloudtrail.management_events
    fields:    [eventName, userIdentity.accountId, resources, sourceIPAddress, userAgent]
    retention: 400d
    latency:   < 5m

query:
  language: sql
  ref:      queries/aws_sts_cross_account_assumerole.sql

validation:
  framework: stratus-red-team
  technique: aws.iam.backdoor-role
  last_run:  2026-05-29T14:02:11Z
  result:    pass
  latency_ms: 642000

false_positives:
  - description: New partner onboarding (legitimate trust relationship created < 90 days ago)
    mitigation: Add account ID to allowlist after IAM team approval
  - description: AWS internal services (userAgent matches aws-internal/%)
    mitigation: Excluded in query

runbook:
  url: https://runbooks.example.com/det-aws-sts-001
  steps:
    - Identify the source account ID and check threat intel feeds
    - Pull all activity for the assumed role session in the next 60 minutes
    - Contact account owner via verified channel (not email)
    - If unauthorised: revoke session, rotate role trust policy, open IR ticket

retirement:
  trigger:    Fewer than 1 true positive in 180 days
  superseded_by: AWS IAM Access Analyzer external access findings (when GA in our region)
  review_date: 2026-12-01

DOWNLOADABLE ASSET

Detection Engineering Rule Template Pack (YAML)

A ready-to-fork pack of 12 rule scaffolds — cloud, endpoint, identity, and network — pre-mapped to MITRE ATT&CK with embedded validation hooks. Drop into your repo, fill in the hypothesis, ship.

Download (.zip)

Failure analysis: when detections lie

Every detection fails. The question is whether the failure surfaces as a fixable signal or as silent erosion. Two failure modes dominate.

Failure mode 1 · The rule fires too often

You will know it has happened when the on-call rotation starts using the word “muted” in standup. The instinct is to raise the threshold or narrow the filter. The discipline is to ask a different question: what did the hypothesis miss? A noisy rule almost always indicates a benign sub-population of the behavior you did not anticipate in Stage 01.

Concrete remediation steps, in order:

Cluster the alerts. Group last 30 days of fires by the most distinguishing field — source account, user agent, session name, hour-of-day. Eighty percent of the noise usually lives in two or three clusters.
Validate each cluster. For each cluster, ask: is this benign, malicious, or unknown? Anything “unknown” is a tuning failure, not a noise failure.
Encode the carve-out as data, not logic. Allowlists belong in a tracked YAML or table, with owners and expiry dates. Allowlists that live inside the WHERE clause are landmines.
Update the hypothesis. Rewrite Stage 01 to reflect the new understanding. The hypothesis is a living document, not a tombstone.

Failure mode 2 · The detection missed a known incident

The retrospective starts with shame and a Jira ticket. Skip the shame. The framework gives you a structured post-mortem: walk the five stages backwards.

Stage 05 (Metrics): Did the rule fire and get ignored? Then this is an alerting failure, not a detection failure.
Stage 04 (Validation): Has this exact behavior ever been replayed in the lab? If no, validation gap.
Stage 03 (Logic): Did the event match the data but not the rule? Logic gap — usually a filter that was too aggressive.
Stage 02 (Data): Was the event in the logs at all? If no, data gap — ingest, parsing, or retention.
Stage 01 (Hypothesis): Was this behavior ever in scope? If no, you have learned something the threat model did not.

Each failure mode produces a different remediation. The framework lets you find the right one in minutes rather than weeks. For deeper guidance on hunting hypotheses that feed back into detection logic, see our cloud threat hunting series.

SOC alerting vs detection engineering

The two disciplines are often conflated. They are not the same job and require different success metrics. The table below summarises the divide.

Dimension	Traditional SOC Alerting	Detection Engineering
Primary unit of work	An alert	A detection
Time horizon	Minutes (per incident)	Quarters (per rule lifecycle)
Source of new rules	Vendor content packs, ad-hoc requests	Hypotheses tied to threat model + MITRE
Validation	It alerted in prod once — ship	Reproducible lab replay before prod
Tuning	Raise threshold until noise stops	Refine hypothesis, encode allowlists as data
Success metric	Alerts triaged per hour	True-positive rate, MTTD, MITRE coverage
Retirement	Rules linger forever	Each rule has a documented sunset trigger
Tooling	SIEM console	Git, CI, data lake, validation harness
Failure response	Disable rule, move on	Walk five stages backwards, fix root cause
Output of a quarter	Closed tickets	Improved coverage & precision telemetry

Healthy programs run both functions side by side. The SOC handles real-time triage and incident response. Detection engineering produces and maintains the rules the SOC consumes. The contract between them is the rule’s runbook and its expected TP rate. Pair this article with our MITRE coverage guide for the macro view of how individual detections roll up into program-level visibility.

Frequently asked questions

How many detections should a mature program maintain?

There is no magic number. Coverage matters more than count. A team that covers 60 percent of the techniques relevant to its threat model with 80 high-precision rules is in a healthier position than a team with 800 noisy rules covering 95 percent. Track coverage and precision together — either alone is misleading. Our MITRE coverage page expands on this.

Should every detection be tied to a MITRE technique?

Yes, with a caveat. Most should map cleanly. Some — data exfiltration volume anomalies, business-logic abuse, fraud signals — do not have clean MITRE anchors. For those, document the threat model linkage explicitly in the hypothesis field. The rule still needs a why; it just does not need a T-ID.

How do detection engineering and threat hunting differ?

Threat hunting is exploratory and hypothesis-driven, looking for known-bad in your data. Detection engineering takes successful hunts and turns them into repeatable, automated, validated detections. Hunting produces the input; detection engineering produces the durable artefact. See our threat hunting overview for the hunting side.

Do we need a data lake or is the SIEM enough?

The SIEM is enough for real-time matching. The lake is required for baseline computation, retroactive sweeps when a new IOC drops, and historical validation. Most mature programs route both: SIEM for hot path detection, lake for cold path engineering and hunting. The split is not religious — it is a cost and latency optimisation.

How often should detections be revalidated?

Monthly for cloud detections (schemas drift), quarterly for endpoint and identity. Always revalidate when the underlying data source changes, when a vendor pushes a parser update, or when MITRE publishes a new sub-technique that touches the rule. Validation is cheap; surprise blindness is expensive.

What is the right team size for detection engineering?

One full-time detection engineer per 8 to 12 high-value detections in flight, plus shared SRE-style on-call for production rule health. Smaller programs combine the role with senior SOC analyst time. Larger programs split into platform (data, validation harness, tooling) and content (rule authoring) tracks.

Where does threat intelligence fit?

Threat intelligence feeds the hypothesis stage. New campaign reporting, new TTP analysis, new IOC clusters — each is a candidate hypothesis. Our threat intelligence programme integrates directly with the detection backlog so reports do not die in a SharePoint folder.

STOP SHIPPING ALERT NOISE

Build detections your incident commanders actually trust.

HuntIntel pairs threat intelligence with a detection-engineering workbench — hypothesis to retirement, with MITRE coverage and validation telemetry baked in. Run the framework in this article without writing a line of glue code.

Launch HuntIntel →