Skip to content

CVEProject/cve-ref-archival

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

CVE Program Reference Archiver

Summary

This is a pilot program to explore how to archive URL references found in CVE Records. This repository and README are early-stage.

Why Bother?

Link rot happens, in some cases intentionally and in some cases fairly quickly after a CVE Record is published.. CVE is a valuable, historic, and reasonably comprehensive public data archive. CVE has outlasted many of the original sources of published vulnerability information. It is valuable to archive these sources (public URLs and their content).

Documentation

An Automation Working Group summary of Summary of AWG discussion and requirements, a must read.

Sildes called CVE Reference Investigations that document some of the extent of the link rot problem, plus a threat vector involving CVE ID typo squatting.

Other slides outlining a somewhat more “in-house” solution (which is not the current plan, but things could change).

A flow chart, not necessarily accurate.

The docs/ directory.

Phases

No need to do everything at once, which may even be unwise, as we’ll learn along the way.

Phase 1

ArchiveBbox for local collection, not serving or sharing this collection in Phase 1. So only the project team and Secretariat are likely to have access. ArchiveBox uses the Django development web server that we should probably not run on the internet.

We could also submit references to the Internet Archive Wayback Machine. This can be “fire and forget” or “be nicer and check dead links and check already submitted and recent-enough URLs before submitting.” The Wayback Machine has features to manage duplicate and "overly young" references.

Phase 2

Review/reconsider ArchiveBox, could continue, replace with a different project, replace with in-house software, switch to paid external provider, or stay on-prem. Decide and implement a way to share the local archive. Public web site, torrents, only to registered CNAs/Program members? Consider other public services than the Internet Archive, if such exist.

Phase N

The future is unclear, but once we have something in place, archiving shouldn’t require a lot of major dynamic changes. Operate, add storage, manage and tweak crawler(s) and external destinations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published