From c369229dcbfa91cb1d9b2dfd377efa14231da1ad Mon Sep 17 00:00:00 2001
From: DENGXUELIN <37065511+DENGXUELIN@users.noreply.github.com>
Date: Mon, 8 Jun 2026 20:03:56 +0800
Subject: [PATCH] Improve detection data source health gates

---
 skills/secops/detection-engineering/SKILL.md  | 32 ++++++++++++++
 ...urrent-telemetry-replay-schema-verified.md | 43 +++++++++++++++++++
 .../stale-edr-parser-zero-match-coverage.md   | 41 ++++++++++++++++++
 3 files changed, 116 insertions(+)
 create mode 100644 skills/secops/detection-engineering/tests/benign/current-telemetry-replay-schema-verified.md
 create mode 100644 skills/secops/detection-engineering/tests/vulnerable/stale-edr-parser-zero-match-coverage.md

diff --git a/skills/secops/detection-engineering/SKILL.md b/skills/secops/detection-engineering/SKILL.md
index 975b3c66..c7d631c4 100644
--- a/skills/secops/detection-engineering/SKILL.md
+++ b/skills/secops/detection-engineering/SKILL.md
@@ -53,6 +53,7 @@ Before beginning, gather or confirm:
 
 - [ ] **Target ATT&CK technique(s):** The specific technique or sub-technique IDs to detect (e.g., T1059.001 -- PowerShell).
 - [ ] **Available log sources:** What telemetry is collected? (Windows Event Logs, Sysmon, EDR, cloud audit logs, proxy logs, DNS logs, firewall logs).
+- [ ] **Data-source health:** Last successful ingestion time, expected event volume, parser/schema version, required field coverage, and replay/canary status for each log source used by the detection.
 - [ ] **SIEM platform(s):** Target SIEM for rule deployment (Microsoft Sentinel, Splunk, Elastic, Chronicle, QRadar) -- determines Sigma backend conversion target.
 - [ ] **Environment context:** Operating systems, domain structure, cloud providers, key applications in the environment.
 - [ ] **Existing detection coverage:** Current rules, known gaps, previous false positive history for similar detections.
@@ -86,6 +87,28 @@ ATT&CK Technique Analysis:
 - Detection Scope:    [Sub-technique specific | Parent technique broad]
 ```
 
+### Step 1.5: Data Source Health and Telemetry Drift Gate
+
+Before treating a detection as operational, verify that each required log source is currently producing the fields the rule depends on. A good Sigma rule still provides only theoretical coverage when the collector is stale, the parser changed, or the expected event volume collapsed to zero.
+
+Record evidence for each required data source:
+
+| Evidence | Required check | Failure signal |
+|----------|----------------|----------------|
+| Last successful ingestion | Most recent event timestamp and pipeline ingestion timestamp are within the environment's freshness threshold | Last event or last pipeline success is stale |
+| Expected event volume | Baseline count for the same source, host group, and time window is non-zero and plausible | Sudden zero-match or sharp drop without an approved maintenance window |
+| Parser/schema version | Parser, field mapping, or data model version is known and compatible with the Sigma fields | Required fields renamed, missing, or mapped to a different type |
+| Required field coverage | Sample events contain the fields used by selections, filters, and output context | Rule depends on absent fields such as `CommandLine`, `Image`, `User`, or `EventID` |
+| Collector/forwarder health | Agent, forwarder, API connector, or cloud integration shows healthy status | Health monitor disabled, stale heartbeat, or failed connector job |
+| Replay/canary proof | Recent replay, atomic test, or synthetic event confirms the rule can match current telemetry | No replay proof after parser, sensor, or pipeline changes |
+
+Classification guidance:
+
+- Escalate to P2/High when a required data source is stale, has zero expected volume without explanation, or has parser drift for a technique with relevant threat intelligence.
+- Keep as P3/Medium when health evidence is incomplete but the data source appears present and a replay test is scheduled.
+- Downgrade to P4/Low only when independent health monitoring, fresh ingestion, schema compatibility, and replay/canary evidence prove that the apparent gap is operationally covered.
+- Mark coverage as `Theoretical` rather than `Operational` when source health, parser version, or field coverage evidence is missing.
+
 ### Step 2: Detection Logic Design
 
 Design the detection logic before writing the rule. Consider:
@@ -389,6 +412,11 @@ Produce detection engineering deliverables in this structure:
 | Target Coverage | [Operational / Robust] |
 | Validation Method | [Atomic Red Team test ID / manual test procedure] |
 
+### Data Source Health and Drift Evidence
+| Data Source | Expected Volume Window | Last Ingestion | Parser/Schema Version | Required Fields Present | Replay/Canary Result | Health Decision |
+|-------------|------------------------|----------------|-----------------------|-------------------------|----------------------|-----------------|
+| [Sysmon process_creation] | [baseline count / window] | [timestamp] | [version/hash] | [yes/no/list gaps] | [passed/failed/not run] | [current/stale/drifted/theoretical] |
+
 ### Deployment Notes
 - **Target SIEM:** [Platform]
 - **Converted Query:** [KQL/SPL/EQL equivalent if requested]
@@ -494,6 +522,10 @@ Detection rules are not write-once artifacts. Log sources change, environments e
 
 Overly broad or incorrect ATT&CK mappings undermine coverage analysis. A rule that detects a specific PowerShell obfuscation technique should map to T1059.001 (PowerShell) and potentially T1027 (Obfuscated Files or Information), not to the parent T1059 alone. Use sub-technique IDs when the detection is specific to a sub-technique. Validate mappings against the ATT&CK technique definition and procedure examples.
 
+### Pitfall 6: Treating Zero Matches as Coverage
+
+A rule that returns zero matches is not automatically clean coverage. It may mean the required data source is missing, the parser renamed a field, the collector stopped forwarding events, or the rule was converted to a backend query with incompatible field names. Always pair zero-match results with ingestion freshness, expected event volume, parser/schema version, and replay evidence before marking coverage as operational.
+
 ---
 
 ## 8. Prompt Injection Safety Notice
diff --git a/skills/secops/detection-engineering/tests/benign/current-telemetry-replay-schema-verified.md b/skills/secops/detection-engineering/tests/benign/current-telemetry-replay-schema-verified.md
new file mode 100644
index 00000000..fc7c5bb9
--- /dev/null
+++ b/skills/secops/detection-engineering/tests/benign/current-telemetry-replay-schema-verified.md
@@ -0,0 +1,43 @@
+# Benign Fixture: Current Telemetry With Replay and Schema Proof
+
+## Scenario
+
+A detection engineer reviews a Sigma rule for suspicious PowerShell encoded commands after a SIEM parser upgrade. The converted backend query has a low fire count, but telemetry health and replay evidence show that the data source is current and the rule fields still map correctly.
+
+## Evidence
+
+```yaml
+detection:
+  technique: T1059.001
+  sigma_fields:
+    - Image
+    - CommandLine
+    - ParentImage
+  required_data_source: sysmon_process_creation
+telemetry_health:
+  last_successful_ingestion: "2026-06-08T10:58:00Z"
+  review_time: "2026-06-08T11:00:00Z"
+  expected_event_volume:
+    baseline_window: "same weekday 09:00-11:00"
+    baseline_count: 4200
+    current_count: 4187
+  parser_schema:
+    current_version: "sysmon-process-v12"
+    schema_change_ticket: "DET-2241"
+    required_fields_present:
+      Image: true
+      CommandLine: true
+      ParentImage: true
+  collector_health:
+    heartbeat_age_minutes: 2
+    connector_status: "healthy"
+  replay_canary:
+    test_id: "ART-T1059.001-encoded-command"
+    run_time: "2026-06-08T10:45:00Z"
+    result: "matched expected rule"
+coverage_decision: "Operational"
+```
+
+## Expected Result
+
+The skill should allow operational coverage because fresh ingestion, plausible expected volume, compatible parser schema, required field coverage, healthy collector status, and replay proof are all present. Any remaining low fire count can be treated as a tuning observation rather than a missing-source finding.
diff --git a/skills/secops/detection-engineering/tests/vulnerable/stale-edr-parser-zero-match-coverage.md b/skills/secops/detection-engineering/tests/vulnerable/stale-edr-parser-zero-match-coverage.md
new file mode 100644
index 00000000..6721c53d
--- /dev/null
+++ b/skills/secops/detection-engineering/tests/vulnerable/stale-edr-parser-zero-match-coverage.md
@@ -0,0 +1,41 @@
+# Vulnerable Fixture: Stale EDR Parser Treated as Coverage
+
+## Scenario
+
+A team marks PowerShell command-line detection coverage as operational because the converted SIEM query returns zero matches across the previous seven days. The ATT&CK mapping and Sigma logic are reasonable, but the EDR process telemetry pipeline changed parser versions during an agent upgrade.
+
+## Evidence
+
+```yaml
+detection:
+  technique: T1059.001
+  sigma_fields:
+    - Image
+    - CommandLine
+    - ParentImage
+  required_data_source: edr_process_creation
+telemetry_health:
+  last_successful_ingestion: "2026-06-01T03:10:00Z"
+  review_time: "2026-06-08T11:00:00Z"
+  expected_event_volume:
+    baseline_window: "previous 7 daily business windows"
+    baseline_count_per_day: 18000
+    current_count_per_day: 0
+  parser_schema:
+    previous_version: "edr-process-v4"
+    current_version: "edr-process-v5"
+    field_drift:
+      CommandLine: "renamed to process.command_line"
+      ParentImage: "renamed to parent.process.path"
+  collector_health:
+    forwarder_heartbeat: "stale"
+    health_ticket: null
+  replay_canary:
+    last_test: "2026-05-15"
+    result: "not rerun after parser migration"
+coverage_decision: "Operational"
+```
+
+## Expected Result
+
+The skill should not accept `Operational` coverage. The zero-match result is evidence of a telemetry or parser failure until fresh ingestion, expected volume, field mapping, collector health, and replay proof are provided. This should remain P2/High when the technique is relevant to active threat intelligence.