Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize versionTypes #362

Open
darakian opened this issue Nov 14, 2024 · 19 comments
Open

Formalize versionTypes #362

darakian opened this issue Nov 14, 2024 · 19 comments
Labels
Needs Discussion Discuss in a future QWG meeting or on mailing list section:affected_product Schema location is affected or product

Comments

@darakian
Copy link

darakian commented Nov 14, 2024

At the moment the versionType

"versionType": {
"type": "string",
"description": "The version numbering system used for specifying the range. This defines the exact semantics of the comparison (less-than) operation on versions, which is required to understand the range itself. 'Custom' indicates that the version type is unspecified and should be avoided whenever possible. It is included primarily for use in conversion of older data files.",
"minLength": 1,
"maxLength": 128,
"examples": [
"custom",
"git",
"maven",
"python",
"rpm",
"semver"
]

identifier is fairly loosely defined. The description reads

This defines the exact semantics of the comparison (less-than) operation on versions, which is required to understand the range itself.

However it does not inform the reader of the exact semantics of any of the example version types (nor does it lay out a complete list of types).
In essence that just means that all the versionType are fancy unstructured text.

I suggest that for a future revision of the spec we create and maintain a list of supported version schemes. Custom likely needs to remain for backwards compatibility, however I suggest that we also encourage users of the custom type to propose new versionTypes. I would advise against version types that are named for languages eg. python as languages can end up in packages/executables which tend to not share the same idea of a version as the backing language.

If I understand the intent of the python identifier correctly (big if) then I would suggest that it be renamed to pypa and follow the convention here
https://packaging.python.org/en/latest/specifications/version-specifiers/

For the purpose of longterm maintainability I suggest that whatever versionTypes are formally adopted have their full spec copied into this repo. This allows the spec to say we support that one right there in that file we control and we won't be subject to a spec being updated without anyone knowing. Downside there is that it will require updates to the versionType spec over time and maybe versions for versionType (yo dawg). Ideally each version type also has an associated ordering (might be hard for a git type), but failing that then versions could be presented as a list.

Semver for instance is on version 2.0.0 and no change log (that I can find) exists from 1.x.y. So it would be most correct to rename semver -> semver 2.0.0

or 1.x.y if that's desired instead.
https://semver.org/

I do not propose that we attempt to catalog a large number of version schemes. Quite a few exist out there and most really don't matter. Some may as well be strings eg.
https://central.sonatype.org/publish/requirements/#correct-coordinates

The version can be an arbitrary string and can not end in -SNAPSHOT , since this is the reserved string used to identify versions that are currently in development.

Some more fun ones
https://nesbitt.io/2024/06/24/from-zerover-to-semver-a-comprehensive-list-of-versioning-schemes-in-open-source.html

I think a manageable approach would be to support custom (for a catch all case) semver (2.0.0), datever, and maybe a few others where we know people publishing date can reliably provide them. Then as CNAs need/want more options we ask them to make a proposal for new versions and we add them to the list.

@MrMegaZone
Copy link
Collaborator

Today 'versionType' is basically free text - you can put anything in there you want. So I could do something like "F5-BIG-IP-Versioning" or something silly like that. Today I use 'Custom' because what we do with that product doesn't fit any of the other buckets. (It is kind of a mutant semver - but NOT semver compliant.)

Are you suggesting we make this more of a defined list of allowed strings and remove the free text ability today? I could see how that could make validation easier. I'm not sure it would fix a common problem we have today though - which is the 'versionType' being provided, but being incorrect. For example, stating 'semver' when it isn't actually semver - a mistake I made on F5 CVEs for a while, before someone called me on it and I switched to 'custom'. I've seen similar issues with others.

If we do get proscriptive about what the string can be, I'd suggest we explicitly add 'x_' as a prefix to allow additional versionTypes to be used while pending addition to an official list/future schema.

I'm not sure how to solve the issue of mismatched versionType declarations. 'semver' is one of the easier types to validate, and even that isn't necessarily 'simple' to do. Should we have validation for any supported types we add to ensure they aren't misused?

@darakian
Copy link
Author

darakian commented Nov 21, 2024

Are you suggesting we make this more of a defined list of allowed strings and remove the free text ability today?

I'm suggesting we make a defined list for sure. Removing free text could be a goal, but maybe we can get around to that in a century or two once we observe that most cve publication is covered by the version types that we formalize. In the short term we keep free text, observe how its used and attempt to make a better way for the patterns we observe.

Using your F5-BIG-IP-Versioning example (or maybe F5-BIG-IP-Versioning-1.0.0 👀) rather than simply putting that label in versionType in an adhoc way, you (or whoever at f5) would open a PR to suggest F5-BIG-IP-Versioning-1.0.0 be added to the list and in that PR we would collectively define how F5-BIG-IP-Versioning-1.0.0 versions are specified and how they are ordered. After the new type is merged you publish your CVEs with those versions and everyone has a shared understanding of how to interpret the versions on them.

I'm not sure it would fix a common problem we have today though - which is the 'versionType' being provided, but being incorrect."

Agreed. We would need to have validation in place and to collectively correct errors when we observe them. That said, we can't correct anything until we define what "correct" is, hence this issue.

I'd suggest we explicitly add 'x_' as a prefix to allow additional versionTypes to be used while pending addition to an official list/future schema.

I think I disagree there. imo custom should be used until a type is formalized. Adding x_ would allow for adhoc creation of version schemas which would very likely be poorly defined and we would circle back to the problems we have today. It is better in my mind to define a format before usage rather than after.

Should we have validation for any supported types we add to ensure they aren't misused?

Yes.

@ryOF65aErb
Copy link

ryOF65aErb commented Nov 21, 2024

(George) Among other things, disallow relational operators as part of the version string.

@darakian
Copy link
Author

(George) Among other things, disallow relational operators as part of the version string.

Say more. Are you talking about <, > and company? What issues are you hitting with them and would they work if their usage was well defined?

@ryOF65aErb
Copy link

ryOF65aErb commented Nov 22, 2024 via email

@darakian
Copy link
Author

Can you share some examples? From the conversation in the QWG meeting yesterday I thought most of us were aligned that we wanted to support ranges (and hence relational operators). Is the issue you're hitting more to do with a lack of consistency or a misuse of the operators?

@ryOF65aErb
Copy link

ryOF65aErb commented Nov 22, 2024 via email

@darakian
Copy link
Author

I prefer for values to always refer to a single unambiguous point in version history.

Agreed. That's part of the idea of defining specific version types so that the reader of the version information knows.

If we have such endpoint variables that define endpoints and definitions of those variables that describe how they relate to bounding of vulnerable (or not vulnerable) ranges, then it seems redundant to also allow relational operators.

I think that's a semantic difference. If we define a type which is a tuple (x, y) and the rules of the type says that the values should be read as x is an inclusive lower bound and y is an exclusive upper bound then that's equivalent to defining the type as >= x, < y.
Though a tuple would probably be easier to build validation for in practice.

@ryOF65aErb
Copy link

ryOF65aErb commented Nov 27, 2024 via email

@darakian
Copy link
Author

darakian commented Dec 2, 2024

@ryOF65aErb for the purposes of this issue I'd like to keep things focused away from the custom type since that's more or less unstructured by definition. I'm more curious what new types would be useful to you (and to others).

@ryOF65aErb
Copy link

ryOF65aErb commented Dec 2, 2024 via email

@darakian
Copy link
Author

darakian commented Dec 2, 2024

what % are “custom”.

Functionally 100% today since there are no rules for any of the types. Good news is that there's room for improvement 👍

I see automation as job 1.

Totally aligned. Again, please share any specific version specs you think would be useful.

@zmanion
Copy link
Contributor

zmanion commented Dec 9, 2024

I support selecting/developing an initial set of properly-specified version types. Open to what these might be. Some thoughts and addtional input, not expressing a strong preference:

  • SemVer
  • purl (All types? Some types? Just an example of types? purl only works with defined types.)
  • vers (came from purl)
  • vers-like (from CSAF, a degenerate use of vers)
  • The vulnogram UI semi-supports specific version types but I don't think it validates or enforces anything
  • The OSV schema requires an ecosystem, which IIUC means an ecosystem-specific version syntax

It makes sense to choose something that is already in use and/or demand.

@zmanion
Copy link
Contributor

zmanion commented Dec 9, 2024

While this issue is about version types, this implicates how ranges work (e.g., operators, wildcards) and even status, which may vary by version type. I'd prefer that CVE maintains only one set of range rules if possible. Also a reminder that CVE doesn't include a "fixed" status and we've been living on implications for 20+ years.

darakian added a commit to darakian/cve-schema that referenced this issue Dec 9, 2024
First crack at adding a formal version type in response to
CVEProject#362 (comment)
Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic

Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
@darakian
Copy link
Author

darakian commented Dec 9, 2024

It makes sense to choose something that is already in use and/or demand.

Agreed. We should invent as little as possible.

While this issue is about version types, this implicates how ranges work...

Indeed. The version schemes will need to take a stance on what makes a < b for any given a and b. In math this is referred to as an ordering https://en.wikipedia.org/wiki/Total_order
which some languages have built in support for
https://doc.rust-lang.org/std/cmp/trait.Ord.html

SemVer

PR up. #371
Lets have a semver specific conversation over in that PR 👍

purl (All types? Some types? Just an example of types? purl only works with defined types.)

As far as I was aware purl is not a versioning scheme.

vers (came from purl)
vers-like (from CSAF, a degenerate use of vers)

Is one of these a superset of the other? What's with the pip:/npm: etc...? Is this really just multiple versioning schemes masquerading as a single? Seems cleaner to break them out if so.

The vulnogram UI semi-supports specific version types but I don't think it validates or enforces anything

Are they documented anywhere?

The OSV schema requires an ecosystem, which IIUC means an ecosystem-specific version syntax

There's a conversation about that going on right now. I think the approach that's going to be taken is that the ecosystem implicitly defines a version scheme/ordering and that's whatever the backing package registry says it is.

Also a reminder that CVE doesn't include a "fixed" status and we've been living on implications for 20+ years.

That can be done independently of defining versioning/ordering so lets kick that to a different issue.

I'd prefer that CVE maintains only one set of range rules if possible.

I don't see that being possible. This jenkins plugin for instance has releases which would be ordered differently depending if you consider them in lexicographical/semver-ish ordering vs release date ordering. Specifically 1365.1367.va_3b_b_89f8a_95b_ and 1365.v4778ca_84b_de5. As far as I'm aware the release date ordering is the correct ordering for that project.

@ccoffin ccoffin added the Needs Discussion Discuss in a future QWG meeting or on mailing list label Dec 12, 2024
@jayjacobs jayjacobs added the section:affected_product Schema location is affected or product label Jan 3, 2025
@jayjacobs
Copy link
Collaborator

jayjacobs commented Jan 3, 2025

Note: this should probably be discussed in context of #263, #264, #279, and #280

@ccoffin
Copy link
Collaborator

ccoffin commented Jan 16, 2025

Discussed in 2025-01-16 QWG. No objections to moving forward with enumerated list of a few common version types. Might start with custom and semver2 at the very least.

@jayjacobs
Copy link
Collaborator

pasted something similar in #263, but as of today there are 308 unique strings in the "versionType" field and here are the top 20 strings and some counts for the values in that field:

   versionType             instances   cves cves_within_last_year
 1 NA                         479374 212216                 13557     # field defined but empty 
 2 semver                      42215  16107                  7901
 3 custom                     125322  37790                  5767
 4 git                         21810   4694                  3134
 5 original_commit_for_fix      4371   4371                  2985
 6 rpm                          3384    340                   136
 7 maven                         727    421                   104
 8 patch                         313    115                    51
 9 release                       140     60                    26
10 python                         92     33                    24
11 SPL                            19     19                    19
12 server                         19     16                    16
13 2024.3                         15     15                    13
14 PI                             32     12                    12
15 rpm, exe                       29     29                    11
16 Patch                          46     26                    11
17 Server                         11     11                    10
18 8121                           11     11                     8
19 2023.2.3                        7      7                     7
20 7271                            7      7                     7
# and 288 more rows

@jayjacobs
Copy link
Collaborator

We reopened #173 to specifically track the addition of PURL into the schema.

  • We need to decide what versionTypes we will support initially in the affect array.
  • Need to decide whether PURL should be part of that or a separate data structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Discuss in a future QWG meeting or on mailing list section:affected_product Schema location is affected or product
Projects
None yet
Development

No branches or pull requests

6 participants