The Maturity Matrix
Foundations
Section titled “Foundations”Strategy & Governance
Section titled “Strategy & Governance”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| No AI strategy for Detection Engineering. AI is either out of scope or actively prohibited pending policy. | Unofficial acceptance of individual AI use. No written policy, no leadership visibility, no risk acceptance. | Written AI usage policy in force: data-sensitivity tiers, approved models/vendors, escalation paths. Leadership sponsors a defined pilot scope. AI-specific risks (data egress, hallucination impact, model regression) are on the risk register. | AI is a funded line in the Detection Engineering roadmap with reported ROI. Model and vendor reviews are periodic. Governance forum reviews agent autonomy expansions. Policy is enforced by tooling, not just memo. |
People & Skills
Section titled “People & Skills”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| No AI literacy expected or developed. No training. AI experience isn’t a hiring criterion. | Individual self-teaching. A few engineers are heavy AI users. Most aren’t. Skill is unevenly distributed and tribal. | Team-wide AI fluency baseline: prompt design, RAG behavior, oversight discipline. Reviewing AI output is a defined competency. Hiring assesses AI workflow experience. | Engineers design and tune agents, write quality control evals, and audit AI behavior. Oversight discipline is taught and measured. Career ladders include AI-native Detection Engineering roles distinct from “user of AI tools.” |
Tooling & Infrastructure
Section titled “Tooling & Infrastructure”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| Conventional Detection Engineering tooling only: SIEM consoles, detection-as-code repo, ticketing. No AI surface of any kind. | Public chatbots accessed in a browser. No IDE integration, no logging, no rate or cost controls, no shared configuration. | Sanctioned AI surface (IDE plugin, internal app, or MCP-enabled tools) with logging, observability, model selection, and cost/rate controls. Self-hosted or contracted SaaS with documented data-use terms. | Mature agentic infrastructure: orchestration, sandboxed tool execution, structured outputs, multi-model routing, cost/latency budgets, deep observability of agent decisions. |
Data & Knowledge Management
Section titled “Data & Knowledge Management”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| No AI-accessible knowledge corpus. Tribal knowledge lives in engineers’ heads and scattered wikis. | Engineers paste snippets and reports into chatbots ad-hoc. No indexed corpus, no retrieval, no curation. | Maintained RAG corpora over the detection repo, log/schema docs, runbooks, and at least one Threat Intel feed. Corpora are owned, scheduled for refresh, and version-controlled. | Telemetry-as-context: agents query live data shapes, incident graphs, and threat-intel knowledge graphs. Prompt libraries, eval datasets, and rejected outputs are versioned, first-class assets. |
Evaluation & QA
Section titled “Evaluation & QA”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| No AI output to evaluate. | No formal evals. “Looks right to me” review at point of use. Wins and failures are anecdotal. | Golden datasets exist for the main AI use cases (rule authoring, Threat Intel extraction, triage). Regression suites run in CI. Hallucination, FP impact, and drift are tracked. | Continuous evals gate production deploys. Outcome-based metrics (post-deploy FP/TP, MTTD deltas) feed back into evals. Red-team eval suites cover prompt injection, jailbreaks, and policy bypass. |
Security, Privacy & Safety
Section titled “Security, Privacy & Safety”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| Not applicable. No AI surface to secure. | No prompt-injection or data-egress controls. PII, secrets, customer data, and proprietary detection logic may leave the org via public chatbots. | Data-classification enforced at the AI surface: PII/secret redaction, allowed-source lists, model-routing based on sensitivity. Outputs validated against schema/policy before action. Model and provider SBOM tracked. | Prompt-injection defenses red-teamed on a schedule. Output validators block unsafe tool use and dangerous actions. Model supply chain monitored for regressions and provenance. AI-specific incident-response runbooks exercised. |
Detection Lifecycle
Section titled “Detection Lifecycle”Detection Opportunity Ideation
Section titled “Detection Opportunity Ideation”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| Manual ideation from Threat Intel reports, Incident Response post-mortems, red/purple team findings, and threat modeling exercises. Detection opportunities surface in heads and meetings. | Engineers paste reports, post-mortems, or assessment write-ups into a chatbot for ad-hoc summarization or IOC/TTP extraction. Output is unstructured and unverified. | Sanctioned tooling extracts detection opportunities from multiple input sources (Threat Intel, Incident Response post-mortems, red/purple team reports, threat models) into a normalized format. Opportunities are contextualized against the org’s environment, telemetry, and prior coverage. | Agentic pipeline runs continuously: ingest from all sources, enrich, contextualize against org telemetry, propose prioritized detection opportunities based on coverage gaps and observed adversary activity. Opportunities are tied to observable evidence in-org and weighted by source confidence. |
Detection Authoring
Section titled “Detection Authoring”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| Engineers hand-write all detections from scratch. No AI assistance. | Engineers paste threat reports into ChatGPT for ad-hoc rule drafts. No shared prompts, no eval, no schema awareness. Output quality varies wildly per engineer. | Shared AI assistant with RAG over the org’s detection corpus, log schema, and style guide. Generated detections require human review before merge. Prompts and tool definitions are version-controlled and maintained in git. | Agents draft, test, and open PRs for detections from detection opportunity inputs. Humans approve PRs. They don’t author them. Cross-platform translation (e.g., Sigma to KQL to SPL) is automated. Quality control gates block low-quality submissions. |
Detection Testing & Validation
Section titled “Detection Testing & Validation”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| Manual review if any. Mostly visual inspection of rules before deploy. | Engineers occasionally ask a chatbot for test ideas or sample log lines. No structured generation, no execution. | AI generates atomic-style tests aligned to each new detection. Tests execute in a validation environment as part of CI. FP rate is estimated against a sample of historical telemetry before merge. | Agents predict FP rate against full historical telemetry, generate adversary-emulation scenarios mapped to MITRE techniques, and gate PRs on validation outcomes. Failed validations become new evals automatically. |
Tuning, Coverage & Continuous Improvement
Section titled “Tuning, Coverage & Continuous Improvement”| L0: None | L1: Experimental | L2: Integrated | L3: Autonomous |
|---|---|---|---|
| Manual tuning, reactive and ticket-driven. MITRE mapping is sparse or absent. | Engineers ask chatbots for tuning ideas case-by-case. No systematic loop, no coverage view. | AI assists with FP root-cause analysis and recommends tuning changes. Coverage maps (MITRE, crown-jewel) are AI-maintained and reviewed. Recommendations require human approval. | Continuous AI-driven tuning, coverage-gap analysis, and prioritization. Post-deploy outcomes feed agent learning. Detection portfolio health (coverage, FP-rate, age, ownership) is reported automatically to leadership. |