How the platform fits together
Platform stages, four data-flow paths. Sources flow into ingest, enrichment, and the indicator catalogue, then through the Intelligence Cluster Analytics layer — where machine learning, time-series correlation, threat attribution, and Sigma calculation filter signal from noise — before streaming to your operational stack.
End-to-end open architecture. Every step explainable. Every record traceable back to its source feed.
Platform architecture, end to end
Platform components, end to end
Stage 01 — Sources. Seven feed classes contribute to the catalogue. OSINT feeds (over thirty curated sources) provide the volume baseline. Commercial threat-intelligence feeds add premium signal where licensing permits redistribution. Sandbox detonations contribute fresh first-hand observations of malware behaviour. TLS and DNS observatories provide network-layer fingerprints (JA3/JA4/JARM). Honeynet sinkholes capture active scanning and exploitation attempts. Government CERT advisories add validated public intelligence. Each source carries its own confidence reputation, used downstream in scoring.
Stage 02 — Ingest. Every indicator passes through seven processing steps before it reaches the catalogue. Format parsers handle STIX 2.1, JSON, CSV, and XML. Schema normalisation projects every feed into the same canonical IOC schema. Validation rejects malformed or out-of-range records. Deduplication catches both exact and near-duplicates against the existing catalogue. Rate limiting prevents any single source from overwhelming the pipeline. Source health monitoring tracks uptime and freshness per feed. An audit log captures the full provenance trail.
Stage 03 — Enrich. Seven enrichment passes attach the metadata that makes indicators useful for detection. Confidence scoring uses the RFCF (Relative Feed Confidence Factor) algorithm to combine source reputation, corroboration across feeds, recency, and category fit into a zero-to-one-hundred score. MITRE tagging infers the technique the indicator supports. Severity assignment categorises Critical through Low. Geographic and ASN tagging maps IPs to country and hosting organisation. Industry tagging captures sector context. Actor attribution links the indicator to known adversary clusters where applicable. CVE linking cross-references against NVD when the indicator is associated with a known vulnerability.
Stage 04 — Catalogue. The catalogue is the platform’s single source of truth. Indicator records are stored in time-partitioned Parquet tables covering 2010 through the current week, with each partition optimised for the query patterns that target it. Thousands of adversary profiles are maintained alongside, each recomputed continuously as new indicators land. The CVE-to-IOC linking layer cross-references hundreds of CVEs. A dedicated graph layer maintains IOC clusters, CIDR clusters, the C2 network view, and the adversary-to-infrastructure edges. The query plane runs on Athena with a sub-100 millisecond p95 lookup latency, snapshot replication for durability, and PII tokenisation enforced at ingest.
Stage 05 — Intelligence Cluster Analytics (ML stage). This is the stage where raw catalogue records are turned into operational intelligence — where signal is separated from noise. Four sub-blocks run continuously against the catalogue. Machine Learning runs multiple unsupervised clustering, anomaly scoring, sequence-aware, ensemble classification, and graph-embedding models — each targeted at a specific class of adversary behaviour. Time-series correlation applies frequency-domain periodicity extraction, autocorrelation analysis, statistical forecasting, seasonal decomposition, causality testing, change-point detection, sequence alignment, and multi-resolution analysis. Threat Attribution uses statistical operator inference, TTP signature matching, infrastructure-overlap analysis, kill-chain phase mapping, cluster proximity, and campaign lineage tracking with alias resolution. Sigma (σ) Calculation applies statistical band scoring, outlier thresholding, score normalisation, distance-metric anomaly detection, bootstrapped confidence intervals, model evaluation metrics, feature-importance analysis, precision / recall tuning, false-positive curve calibration, and cross-validation. The combined effect: every indicator that lands in an output channel has been ML-filtered, statistically scored, and attribution-tagged before it ever reaches your SIEM.
Stage 06 — Outputs. Seven output channels expose ML-scored intelligence to your operational tooling. A REST API documented in OpenAPI 3.0 supports ad-hoc query from analyst dashboards. STIX 2.1 streams support both push and pull semantics. SOAR webhooks deliver enrichment on publish. SIEM exports in CSV, JSON, or Parquet support bulk ingest. A change feed exposes indicator deltas the moment they land. The detection-as-code path integrates with your GitOps workflow for rule lifecycle management. The weekly advisory pipeline delivers the Monday briefing via email or webhook.
Stage 07 — Your stack. Outputs land on your side of the trust boundary. Your SIEM ingests indicator deltas. Your SOAR enriches alerts on demand. Your EDR consumes high-confidence hash and process indicators. Your GitOps workflow ships detection rule updates. Your incident response runbooks pivot on platform context. Your analyst console correlates platform findings with internal telemetry. Your compliance audit chain leverages the provenance trail. Everything that matters operationally stays on your side; the catalogue, the ML models, and the trained weights stay on ours.
What stays on your side
Your raw telemetry does not leave your environment. The hunt-execution layer connects to your S3-stored VPC Flow Logs, CloudTrail, and Kubernetes audit logs via an Athena-backed query plane that runs in your AWS account. Only the hunt definitions and the resulting hit metadata flow back to the operator console. The dashed line in the diagram marks this boundary explicitly.
The catalogue and ML models stay on our side. The multi-million indicators, the thousands of adversary profiles, the cluster graph, the trained ML weights, the model versioning history — these are the value we bring to the relationship. They are maintained on platform infrastructure, kept fresh through continuous ingest and continuous training, and exposed through versioned, documented interfaces.
Audit trails on both sides. Every action on the platform is logged with user identity, timestamp, action type, and target. Logs are exportable and retention is configurable per tenant. The audit chain supports compliance frameworks that require demonstrable evidence of access controls.




