Skip to content

suh004757/CVE-Trace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 

Repository files navigation

Core Insight: A CVE is not a decision.

A CVE is not a decision. A CVE becomes useful only after interpretation, enrichment, prioritization, and justification.

CVE-TRACE is a long-term personal research project that transforms raw CVE disclosures into structured decision trajectories for training and evaluating LLM-based vulnerability intelligence agents.

Rather than treating CVEs as isolated classification problems, CVE-TRACE models the full reasoning process required to operationalize vulnerability information.

๐ŸŽฏ Overview

Most existing CVE-focused machine learning approaches frame vulnerability analysis as a single-step prediction task (e.g., severity scoring or exploitability classification).

CVE-TRACE takes a different approach.

It reframes CVE handling as a multi-step decision-making process, explicitly capturing how security decisions are formed in practice:

Raw CVE โ†’ Normalize โ†’ Enrich โ†’ Decide โ†’ Explain

Each step is stored as a structured (state, action, observation) transition, enabling agent learning beyond answer imitation and toward policy-grounded reasoning.

๐Ÿ” Why Decision Trajectories?

Security decisions are rarely made in one step.

They require:

Interpreting ambiguous vulnerability descriptions

Incorporating external risk signals

Applying organizational or operational policies

Justifying actions to human operators

CVE-TRACE encodes these intermediate reasoning steps directly, making it suitable for:

Training reasoning-capable agents

Evaluating decision faithfulness

Studying uncertainty-aware AI behavior in security contexts

๐Ÿš€ Use Cases

Agent Training Supervised fine-tuning (SFT), preference learning, tool-using agents

Vulnerability Triage Research Understanding how real-world security decisions are formed

Security Decision Intelligence Policy-grounded, explainable vulnerability handling

LLM Robustness Testing Evaluating behavior under incomplete or uncertain information

โœ… What CVE-TRACE Is (and Isnโ€™t) โœ” What it IS

A dataset and pipeline for generating CVE decision trajectories

A framework for training vulnerability triage agents

A bridge between static vulnerability data and operational reasoning

Explicitly designed for LLM + tool-based agents

โœ– What it IS NOT

A real-time detection system (IDS/IPS)

A CVSS replacement

An exploit prediction oracle

Tied to any specific SIEM, scanner, or vendor platform

๐Ÿ—๏ธ Design Principles Principle Description Trajectory-first Models how decisions are made, not just final labels Signal-aware reasoning Grounds decisions in external signals, not CVSS alone Uncertainty-first Treats "unknown" and "insufficient evidence" as valid outcomes Policy-versioned labels All decisions derived from explicit, versioned policies Agent-ready Every record can be replayed as an agent episode ๐Ÿ“Š Data Sources

CVE-TRACE consumes publicly available vulnerability intelligence, including:

CVE / NVD disclosures (raw vulnerability descriptions)

Exploitation likelihood signals (probabilistic or observed)

Advisory metadata and version information

The pipeline is modular by designโ€”new signal sources can be integrated without changing the trajectory format.

๐Ÿ”„ Trajectory Structure

Each CVE is converted into a multi-step decision trajectory.

Step 1 โ€” Ingest (Raw State)

CVE ID

Description

CVSS and metadata

Affected products (as disclosed)

Step 2 โ€” Normalize (Semantic Extraction)

Action: normalize_to_schema

Converts unstructured text into a stable, machine-readable schema:

Affected software

Version ranges

Vulnerability class

Preconditions and impact

Step 3 โ€” Enrich (Signal Join)

Action: enrich_with_signals

Joins external risk signals:

Exploitation likelihood estimates

Known exploitation status

Temporal context (age, patch availability)

Step 4 โ€” Decide (Policy-Grounded Decision)

Action: decide_action

Possible actions:

PATCH_NOW

PATCH_SOON

MITIGATE

MONITOR

ACCEPT_RISK

Decisions are derived from explicit policy logic, not implicit model intuition.

Step 5 โ€” Explain (Operational Output)

Action: generate_explanation

Produces operator-facing artifacts:

Patch ticket drafts

Risk summaries

Decision justifications

๐Ÿ“‹ Example Trajectory { "cve": "CVE-YYYY-NNNN", "trajectory": [ { "state": { "raw_cve": "..." }, "action": "normalize", "observation": { "schema": { "...": "..." } } }, { "state": { "schema": { "...": "..." } }, "action": "enrich", "observation": { "signals": { "exploit_likelihood": 0.42 } } }, { "state": { "signals": { "...": "..." } }, "action": "decide", "observation": { "decision": "PATCH_SOON" } }, { "state": { "decision": "PATCH_SOON" }, "action": "explain", "observation": { "ticket_text": "..." } } ] }

๐ŸŽ“ Learning Objectives

CVE-TRACE enables agents to:

Parse ambiguous security text reliably

Distinguish theoretical severity from practical risk

Ground decisions in external signals

Avoid hallucinated certainty

Generate operator-appropriate explanations

Suitable for:

Supervised fine-tuning (SFT)

Preference learning

Tool-augmented agent training

Evaluation of reasoning faithfulness

๐Ÿ“ Repository Structure cve-trace/ โ”œโ”€โ”€ data/ โ”‚ โ”œโ”€โ”€ raw/ # Raw CVE/advisory data โ”‚ โ”œโ”€โ”€ normalized/ # Schema-normalized CVEs โ”‚ โ””โ”€โ”€ trajectories/ # Agent-ready decision trajectories โ”œโ”€โ”€ schemas/ โ”‚ โ””โ”€โ”€ cve_schema.json โ”œโ”€โ”€ policies/ โ”‚ โ”œโ”€โ”€ policy_v1.yaml # Baseline decision policy โ”‚ โ””โ”€โ”€ policy_v2.yaml # Revised policy โ”œโ”€โ”€ pipelines/ โ”‚ โ”œโ”€โ”€ ingest.py โ”‚ โ”œโ”€โ”€ normalize.py โ”‚ โ”œโ”€โ”€ enrich.py โ”‚ โ””โ”€โ”€ build_trajectory.py โ”œโ”€โ”€ evaluation/ โ”‚ โ”œโ”€โ”€ auto_scoring.py โ”‚ โ””โ”€โ”€ holdout_split.py โ””โ”€โ”€ README.md

๐Ÿ”ฌ Research Scope & Status

Status: Active personal research project

Audience: Security researchers, LLM/agent researchers, SecOps engineers

Intended Use: Research, experimentation, reproducibility

This project is not intended to replace human security judgment, but to study and improve how AI systems support it.

โš–๏ธ License & Ethics

All data sources are public vulnerability disclosures

Outputs reflect policy-based reasoning, not guarantees of exploitation or safety

Use responsiblyโ€”security decisions have real-world consequences

๐Ÿ”ฎ Future Directions

Multi-agent setups (Normalizer โ†’ Triage โ†’ Writer)

Counterfactual trajectory generation

Integration with SBOM-aware contexts

Benchmark tasks for vulnerability reasoning agents

๐Ÿค Contributing

This is a research-focused project. Feedback, discussion, and contributions are welcomeโ€”please open an issue or submit a pull request.

๐Ÿ“š Citation

If you use CVE-TRACE in your research, please cite:

@misc{cve-trace, title = {CVE-TRACE: Decision Trajectories for Vulnerability Intelligence Agents}, author = {Your Name}, year = {2024}, url = {https://github.com/your-username/cve-trace} }

About

Decision Trajectories for Vulnerability Intelligence Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors