Releases: aboutcode-org/scancode-toolkit
v32.0.2
This is a minor license update release with:
- new and updated licenses in LicenseDB
- license-expression V30.1.1 with support for the new licenses
What's Changed
- Add new licenses to licenseDB by @AyanSinhaMahapatra in #3414
- Release Prep v32.0.2 by @AyanSinhaMahapatra in #3415
- Add doc redirects by @AyanSinhaMahapatra in #3413
Full Changelog: v32.0.1...v32.0.2
v32.0.1
This is a minor bugfix release.
There are fixes for two issues in this release:
- #3407
here in typecode we had an improper import of ctypes.utils
and this is fixed in a new release v30.0.1 of typecode - #3408
the setup.cfg and setup-mini.cfg was not aligned for plugin
entrypoints.
What's Changed
- Release prep v32.0.1 by @AyanSinhaMahapatra in #3410
Full Changelog: v32.0.0...v32.0.1
v32.0.0
v32 of ScanCode is all about improved license detections!
We have more licenses and rules, and major updates on post-processing matches to license detections.
We also have major improvements in package license detections and unknown references, along with top level detection
summaries for licenses, and reference data for the licenses detected too. There are also a couple of API changes due to
model changes in license data.
See also https://github.com/nexB/scancode.io/ for a complete, customizable SCA solution using ScanCode and
https://github.com/nexB/scancode-workbench/releases for visualizing data generated by ScanCode Toolkit.
Important API changes:
This is a major release with major API and output format changes and significant
feature updates.
In particular the output format has changed for the licenses and packages, and
also for some of the command line options.
The output format version is now 3.0.0.
See https://github.com/nexB/scancode-toolkit/milestone/15 for more details on this release.
Visit https://github.com/nexB/scancode-toolkit/discussions/3406 to discuss about this release.
Package detection:
-
Update
GemfileLockParser
to track the gem which the Gemfile.lock is for,
which we assign to the newGemfileLockParser.primary_gem
field. Update
GemfileLockHandler.parse()
to handle the case where there is a primary gem
detected from a gemfile.lock. If there is a primary gem, a singlePackage
is created and the detected gem data within the gemfile.lock are assigned as
dependencies. If there is no primary gem, then all of the dependencies are
collected into Package with no name and yielded. -
Fix issue where dependencies were not reported when scanning an extracted
Python project by modifyingBaseExtractedPythonLayout.assemble()
to favor
using package data from a PKG-INFO file from an egg-info directory. Package
data from a PKG-INFO file from an egg-info directory contains the dependency
information collected from the requirements.txt file along side PKG-INFO. -
Fix issue where we were returning incorrect purl package
type
for cocoapods.
pods
was being returned as a purl type for cocoapods, it should be
cocoapods
instead.
https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#cocoapods -
Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
JSON have been separated into two functions: one that creates a PackageData
object from the parsed Resource, and another that calls the previous function
and yields the PackageData. This was done such that we can use the package
manifest data parsing code outside of the scancode-toolkit context in other
libraries. -
The PackageData model now includes a
holder
field, which is populated with
holder data extracted from the copyright field if copyright data is present,
otherwise it remains empty. -
DatafileHandlers now have a classmethod named
get_top_level_resources()
,
which is supposed to yield the top-level Resources of a Package codebase,
relative to a Package manifest file.maven.MavenPomXmlHandler
is the first
DatafileHandler that has this method implemented.
License detection:
-
The SPDX license list has been updated to the latest v3.20
-
This is a major update to license detection where we now combine one or more
license matches in a larger license detection. This approach improves the
accuracy of license detection and removes a larger number of false positive
or ambiguous license detections. See for details
#2878 -
There is a new
license_detections
codebase level attribute with all the
unique license detections in the whole scan, both in resources and packages.
This has the 3 attributes also present in package/resource level license
detections:license_expression
,identifier
anddetection_log
(present optionally if the--license-diagnostics
option is enabled) with
an additional attribute:count
: Number of times in the codebase this unique license detection
was encountered.
-
The data structure of the JSON output has changed for licenses at file level:
-
The
licenses
attribute is deleted. -
A new
license_detections
attribute contains license detections in that file.
This object has three attributes:license_expression
,identifier
andmatches
.matches
is a list of license matches and is roughly
the same aslicenses
in the previous version with additional structure
changes detailed below. Identifier is the detected license-expression with an
UUID generated from the content ofmatches
such that this is unique for
unique detections. We also have another attributedetection_log
with
diagnostics information if the--license-diagnostics
option is enabled. -
A new attribute
license_clues
contains license matches with the
same data structure as thematches
attribute inlicense_detections
.
This contains license matches that are mere clues and where not considered
to be a proper conclusive license detection. -
The
license_expressions
list of license expressions is deleted and
replaced by adetected_license_expression
single expression.
Similarlyspdx_license_expressions
was removed and replaced by
detected_license_expression_spdx
. -
See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>
_
for examples and details.
-
-
The data structure of license attributes in
package_data
and the codebase
levelpackages
has been updated accordingly:-
There is a new
license_detections
attribute for the primary, top-level
declared licenses of a package and another_license_detections
attribute
for the other secondary detections. -
The
license_expression
is replaced by thedeclared_license_expression
andother_license_expression
attributes with their SPDX counterparts
declared_license_expression_spdx
andother_license_expression_spdx
.
These expressions are parallel to detections. -
The
declared_license
attribute is renamedextracted_license_statement
and is now a YAML-encoded string, which can be parsed to recreate the
original extracted license statement. Previously this used to be nested
python objects lists/dicts/string, but now this is always a YAML string.See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>
_
for examples and details.
-
-
The license matches structure has changed: we used to report one match for each
licensekey
of a matched license expression. We now report instead one
single match for each matched license expression, and list the license keys
as alicenses
attribute. This avoids data duplication.
Inside each match, we list each match and matched rule attributred directly
avoiding nesting. Seelicense updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>
_
for examples and details. -
There are new and codebase level attributes with
--license-references
to report
reference license metadata and texts once for each license matched across the
scan; we now have two codebase level attributes:license_references
and
license_rule_references
that list unique detected license and license rules.
for examples and details. This reference data is also removed from license matches
in all levels i.e. from codebase, package and resource level license detections and
resource level license clues, irrespective of this CLI option being used, i.e. default
with--licenses
.
Seelicense updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>
_ -
We replaced the
scancode --reindex-licenses
command line option with a
new separate command namedscancode-reindex-licenses
.-
The
--reindex-licenses-for-all-languages
CLI option is also moved to
thescancode-reindex-licenses
command as an option--all-languages
. -
We can now detect licenses using custom license texts and license rules
stored in a directory or packaged as a plugin for consistent reuse and deployment. -
There is an
--additional-directory
option with thescancode-reindex-licenses
command to add the licenses from a directory. -
There is also a
--only-builtin
option to use ony builtin licenses
ignoring any additional license plugins. -
See #480 for more details.
-
-
We combined the license data file and text file of each license in a single
file with a .LICENSE extension. The .yml data file is now included at the
top of each .LICENSE file as "YAML frontmatter". The same applies to license
rules and their .RULE and .yml files. This halves the number of data files
from about 60,000 to 30,000. Git line history is preserved for the combined
text + yml files.- See #3049
-
There is a new console script
scancode-license-data
to export
license data in JSON, YAML and HTML, with indexes and a static website for use
in t...
v31.2.6
This is a minor hotfix release.
- This fix a crash when parsing a .deb Debian package filename reported in #3259
Full Changelog: v31.2.5...v31.2.6
v32.0.0rc4
What's Changed
- Fix #3250: Invalid SPDX with empty file: no SHA1 by @vargenau in #3279
- Add docs, changelog and authors in CONTRIBUTION and fix typos and errors by @OctoPie23 in #3204
- Silence pyicu warning by @AyanSinhaMahapatra in #3280
- Fix licenses in HTML output by @AyanSinhaMahapatra in #3275
- Fix misc license detection related bugs by @AyanSinhaMahapatra in #3299
- Add copyright holder field to PackageData model by @keshav-space in #3302
- Merge latest skeleton into scancode by @AyanSinhaMahapatra in #3305
- New licenses and license rules by @AyanSinhaMahapatra in #3309
- Update documentation for v32 by @AyanSinhaMahapatra in #3292
- Get valid yaml output by @AyanSinhaMahapatra in #3220
- Fix-up the category of the 'ms-cla' license by @fviernau in #3318
- Release prep V32.0.0rc4 by @AyanSinhaMahapatra in #3336
- Update release script to remove ubuntu18 by @AyanSinhaMahapatra in #3337
New Contributors
- @OctoPie23 made their first contribution in #3204
- @keshav-space made their first contribution in #3302
Full Changelog: v32.0.0rc3...v32.0.0rc4
v31.2.5
This is a minor bug fix release.
- Backport changes from #3218
- Drop python 3.7
Full Changelog: v31.2.4...v31.2.5
v32.0.0rc3
This is the third release candidate for v32.0.0 with two major updates:
- we have changed the way we report license detections. See #3286 (comment) for more details on this.
- added support for SPDX license list 3.20, adding several new licenses and detection rules.
What's Changed
- Add new and improve existing licenses by @AyanSinhaMahapatra in #3271
- Improve License Detection reporting by @AyanSinhaMahapatra in #3286
- Release v32.0.0rc3 prep by @AyanSinhaMahapatra in #3291
Full Changelog: v32.0.0rc2...v32.0.0rc3
v32.0.0rc2
This is the second release candidate for v32.0.0 with a few bug fixes, license rule additions, and updates in the release script now generating app archives for more python versions across Linux/Windows/MacOS.
What's Changed
- Work around heisen-failures in CI by @pombredanne in #3207
- Add HERE Proprietary rule for pom.xml files by @bennati in #3212
- Add required phrase to JSR rule by @bennati in #3218
- Fix choking license detection post-processing #3245 by @AyanSinhaMahapatra in #3247
- Build app archives for all python versions by @AyanSinhaMahapatra in #3232
- Bump version to v32.0.0rc2 by @AyanSinhaMahapatra in #3262
New Contributors
Full Changelog: v32.0.0rc1...v32.0.0rc2
v32.0.0rc1
This is a major new release with API breaking changes.
v32.0.0rc1 is the first release candidate and we expect to have a few more.
Important API changes:
This is a major release with major API and output format changes and significant
feature updates.
In particular changed to the output format for the licenses and packages, and
we changed some of the command line options.
The output format version is now 3.0.0.
Package detection:
-
Update
GemfileLockParser
to track the gem which the Gemfile.lock is for,
which we assign to the newGemfileLockParser.primary_gem
field. Update
GemfileLockHandler.parse()
to handle the case where there is a primary gem
detected from a gemfile.lock. If there is a primary gem, a singlePackage
is created and the detected gem data within the gemfile.lock are assigned as
dependencies. If there is no primary gem, then all of the dependencies are
collected into Package with no name and yielded. -
Fix issue where dependencies were not reported when scanning an extracted
Python project by modifyingBaseExtractedPythonLayout.assemble()
to favor
using package data from a PKG-INFO file from an egg-info directory. Package
data from a PKG-INFO file from an egg-info directory contains the dependency
information collected from the requirements.txt file along side PKG-INFO. -
Fix issue where we were returning incorrect purl package
type
for cocoapods.
pods
was being returned as a purl type for cocoapods, it should be
cocoapods
instead.
https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#cocoapods -
Code for parsing a Maven POM, npm package.json, freebsd manifest and haxelib
JSON have been separated into two functions: one that creates a PackageData
object from the parsed Resource, and another that calls the previous function
and yields the PackageData. This was done such that we can use the package
manifest data parsing code outside of the scancode-toolkit context in other
libraries.
License detection:
-
The SPDX license list has been updated to the latest v3.19
-
This is a major update to license detection where we now combine one or more
license matches in a larger license detection. This approach improves the
accuracy of license detection and removes a larger number of false positive
or ambiguous license detections. See for details
#2878 -
There is a new
license_detections
codebase level attribute with all the
unique license detections in the whole scan, both in resources and packages.
This has the 3 attributes also present in package/resource level license
detections:license_expression
,matches
anddetection_log
and has
two additional attributes:-
identifier
: which is thelicense_expression
with an UUID created out
of the detection contents and is the same for same detections. -
count
: Number of times in the codebase this unique license detection
was encountered.
-
-
The data structure of the JSON output has changed for licenses at file level:
-
The
licenses
attribute is deleted. -
A new
for_license_detections
attribute is aded which references the codebase
level unique license detections, and this is a list ofidentifer
strings from
the codebase level license detections it references. -
A new
license_detections
attribute contains license detections in that file.
This object has three attributes:license_expression
,detection_log
andmatches
.matches
is a list of license matches and is roughly
the same aslicenses
in the previous version with additional structure
changes detailed below. -
A new attribute
license_clues
contains license matches with the
same data structure as thematches
attribute inlicense_detections
.
This contains license matches that are mere clues and where not considered
to be a proper conclusive license detection. -
The
license_expressions
list of license expressions is deleted and
replaced by adetected_license_expression
single expression.
Similarlyspdx_license_expressions
was removed and replaced by
detected_license_expression_spdx
. -
See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-resource>
_
for examples and details.
-
-
The data structure of license attributes in
package_data
and the codebase
levelpackages
has been updated accordingly:-
There is a new
license_detections
attribute for the primary, top-level
declared licenses of a package and another_license_detections
attribute
for the other secondary detections. -
The
license_expression
is replaced by thedeclared_license_expression
andother_license_expression
attributes with their SPDX counterparts
declared_license_expression_spdx
andother_license_expression_spdx
.
These expressions are parallel to detections. -
The
declared_license
attribute is renamedextracted_license_statement
and is now a YAML-encoded string.See
license updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#change-in-license-data-format-package>
_
for examples and details.
-
-
The license matches structure has changed: we used to report one match for each
licensekey
of a matched license expression. We now report instead one
single match for each matched license expression, and list the license keys
as alicenses
attribute. This avoids data duplication.
Inside each match, we list each match and matched rule attributred directly
avoiding nesting. Seelicense updates doc <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#licensematch-result-data>
_
for examples and details. -
There are new and codebase level attributes default with
--licenses
to report
reference license metadata and texts once for each license matched across the
scan; we now have two codebase level attributes:license_references
and
license_rule_references
that list unique detected license and license rules.
for examples and details. This reference data is also removed from license matches
in all levels i.e. from codebase, package and resource level license detections and
resource level license clues.
Seelicense updates documentation <https://scancode-toolkit.readthedocs.io/en/latest/explanations/license-detection-reference.html#comparision-before-after-license-references>
_ -
We replaced the
scancode --reindex-licenses
command line option with a
new separate command namedscancode-reindex-licenses
.-
The
--reindex-licenses-for-all-languages
CLI option is also moved to
thescancode-reindex-licenses
command as an option--all-languages
. -
We can now detect licenses using custom license texts and license rules
stored in a directory or packaged as a plugin for consistent reuse and deployment. -
There is an
--additional-directory
option with thescancode-reindex-licenses
command to add the licenses from a directory. -
There is also a
--only-builtin
option to use ony builtin licenses
ignoring any additional license plugins. -
See #480 for more details.
-
-
We combined the licensedata file and text file of each license in a single
file with a .LICENSE extension. The .yml data file is now included at the
top of each .LICENSE file as "YAML frontmatter". The same applies to license
rules and their .RULE and .yml files. This halves the number of data files
from about 60,000 to 30,000. Git line history is preserved for the combined
text + yml files.- See #3049
-
There is a new console script
scancode-license-data
to export
license data in JSON, YAML and HTML, with indexes and a static website for use
in the licensedb web site. This becomes the API way to getr scancode license
data.See #2738
-
The deprecated "--is-license-text" option has been removed.
This is now built-in with the --license-text option and --info
and exposed with the "percentage_of_license_text" attribute.
All Changes
- Add support for external licenses in scans by @KevinJi22 in #2979
- Separate Package parsing functions by @JonoYang in #3135
- Update docs for deprecated and other options #3126 by @AyanSinhaMahapatra in #3127
- Add license dump option by @AyanSinhaMahapatra in #3100
- Combine license matches in new LicenseDetection by @AyanSinhaMahapatra in #2961
- Fix issue 3155 by running
scancode-reindex-licenses
subcommand instead of using--reindex-licenses
flag by @abhi-kr-2100 in #3159 - Detect wurfl commercial license by @pombredanne in #3163
- Do not use packaging.LegacyVersion #3171 #3177 by @pombredanne in #3180
- More License Detection changes by @AyanSinhaMahapatra in #3154
- docs(fix): how to install Py. 3...
v31.2.4
This is a bugfix release.
There is a fix for an license index issue because of the new "attrs" version 22.2.0 and how
things pickled with the previous version of attrs (the pickled index) cannot unpickle with newer versions.
We have vendored attrs using vendorize for use in the license index such that it isn't impacted by
new package versions. See more details at #3179
Full Changelog: v31.2.3...v31.2.4