Core Insight: A CVE is not a decision.
A CVE is not a decision. A CVE becomes useful only after interpretation, enrichment, prioritization, and justification.
CVE-TRACE is a long-term personal research project that transforms raw CVE disclosures into structured decision trajectories for training and evaluating LLM-based vulnerability intelligence agents.
Rather than treating CVEs as isolated classification problems, CVE-TRACE models the full reasoning process required to operationalize vulnerability information.
๐ฏ Overview
Most existing CVE-focused machine learning approaches frame vulnerability analysis as a single-step prediction task (e.g., severity scoring or exploitability classification).
CVE-TRACE takes a different approach.
It reframes CVE handling as a multi-step decision-making process, explicitly capturing how security decisions are formed in practice:
Raw CVE โ Normalize โ Enrich โ Decide โ Explain
Each step is stored as a structured (state, action, observation) transition, enabling agent learning beyond answer imitation and toward policy-grounded reasoning.
๐ Why Decision Trajectories?
Security decisions are rarely made in one step.
They require:
Interpreting ambiguous vulnerability descriptions
Incorporating external risk signals
Applying organizational or operational policies
Justifying actions to human operators
CVE-TRACE encodes these intermediate reasoning steps directly, making it suitable for:
Training reasoning-capable agents
Evaluating decision faithfulness
Studying uncertainty-aware AI behavior in security contexts
๐ Use Cases
Agent Training Supervised fine-tuning (SFT), preference learning, tool-using agents
Vulnerability Triage Research Understanding how real-world security decisions are formed
Security Decision Intelligence Policy-grounded, explainable vulnerability handling
LLM Robustness Testing Evaluating behavior under incomplete or uncertain information
โ What CVE-TRACE Is (and Isnโt) โ What it IS
A dataset and pipeline for generating CVE decision trajectories
A framework for training vulnerability triage agents
A bridge between static vulnerability data and operational reasoning
Explicitly designed for LLM + tool-based agents
โ What it IS NOT
A real-time detection system (IDS/IPS)
A CVSS replacement
An exploit prediction oracle
Tied to any specific SIEM, scanner, or vendor platform
๐๏ธ Design Principles Principle Description Trajectory-first Models how decisions are made, not just final labels Signal-aware reasoning Grounds decisions in external signals, not CVSS alone Uncertainty-first Treats "unknown" and "insufficient evidence" as valid outcomes Policy-versioned labels All decisions derived from explicit, versioned policies Agent-ready Every record can be replayed as an agent episode ๐ Data Sources
CVE-TRACE consumes publicly available vulnerability intelligence, including:
CVE / NVD disclosures (raw vulnerability descriptions)
Exploitation likelihood signals (probabilistic or observed)
Advisory metadata and version information
The pipeline is modular by designโnew signal sources can be integrated without changing the trajectory format.
๐ Trajectory Structure
Each CVE is converted into a multi-step decision trajectory.
Step 1 โ Ingest (Raw State)
CVE ID
Description
CVSS and metadata
Affected products (as disclosed)
Step 2 โ Normalize (Semantic Extraction)
Action: normalize_to_schema
Converts unstructured text into a stable, machine-readable schema:
Affected software
Version ranges
Vulnerability class
Preconditions and impact
Step 3 โ Enrich (Signal Join)
Action: enrich_with_signals
Joins external risk signals:
Exploitation likelihood estimates
Known exploitation status
Temporal context (age, patch availability)
Step 4 โ Decide (Policy-Grounded Decision)
Action: decide_action
Possible actions:
PATCH_NOW
PATCH_SOON
MITIGATE
MONITOR
ACCEPT_RISK
Decisions are derived from explicit policy logic, not implicit model intuition.
Step 5 โ Explain (Operational Output)
Action: generate_explanation
Produces operator-facing artifacts:
Patch ticket drafts
Risk summaries
Decision justifications
๐ Example Trajectory { "cve": "CVE-YYYY-NNNN", "trajectory": [ { "state": { "raw_cve": "..." }, "action": "normalize", "observation": { "schema": { "...": "..." } } }, { "state": { "schema": { "...": "..." } }, "action": "enrich", "observation": { "signals": { "exploit_likelihood": 0.42 } } }, { "state": { "signals": { "...": "..." } }, "action": "decide", "observation": { "decision": "PATCH_SOON" } }, { "state": { "decision": "PATCH_SOON" }, "action": "explain", "observation": { "ticket_text": "..." } } ] }
๐ Learning Objectives
CVE-TRACE enables agents to:
Parse ambiguous security text reliably
Distinguish theoretical severity from practical risk
Ground decisions in external signals
Avoid hallucinated certainty
Generate operator-appropriate explanations
Suitable for:
Supervised fine-tuning (SFT)
Preference learning
Tool-augmented agent training
Evaluation of reasoning faithfulness
๐ Repository Structure cve-trace/ โโโ data/ โ โโโ raw/ # Raw CVE/advisory data โ โโโ normalized/ # Schema-normalized CVEs โ โโโ trajectories/ # Agent-ready decision trajectories โโโ schemas/ โ โโโ cve_schema.json โโโ policies/ โ โโโ policy_v1.yaml # Baseline decision policy โ โโโ policy_v2.yaml # Revised policy โโโ pipelines/ โ โโโ ingest.py โ โโโ normalize.py โ โโโ enrich.py โ โโโ build_trajectory.py โโโ evaluation/ โ โโโ auto_scoring.py โ โโโ holdout_split.py โโโ README.md
๐ฌ Research Scope & Status
Status: Active personal research project
Audience: Security researchers, LLM/agent researchers, SecOps engineers
Intended Use: Research, experimentation, reproducibility
This project is not intended to replace human security judgment, but to study and improve how AI systems support it.
โ๏ธ License & Ethics
All data sources are public vulnerability disclosures
Outputs reflect policy-based reasoning, not guarantees of exploitation or safety
Use responsiblyโsecurity decisions have real-world consequences
๐ฎ Future Directions
Multi-agent setups (Normalizer โ Triage โ Writer)
Counterfactual trajectory generation
Integration with SBOM-aware contexts
Benchmark tasks for vulnerability reasoning agents
๐ค Contributing
This is a research-focused project. Feedback, discussion, and contributions are welcomeโplease open an issue or submit a pull request.
๐ Citation
If you use CVE-TRACE in your research, please cite:
@misc{cve-trace, title = {CVE-TRACE: Decision Trajectories for Vulnerability Intelligence Agents}, author = {Your Name}, year = {2024}, url = {https://github.com/your-username/cve-trace} }