Releases: aboutcode-org/scancode-toolkit
v31.2.3
This is a bugfix release.
There is a fix for an installation issue with the new "packaging" version 22.0.
This is replaced by a fork named "packvers" to work around
pypa/packaging#530 and provide an emergency
fix for #3171 and #3177
We updated these dependencies:
With the new:
We also improved the compatibility for pre-built wheels and now build one
wheel for each Python version to work around some Python pickle bug.
We pinned SPDX tools for cope with the upcoming API breaking changes.
Full Changelog: v31.2.1...v31.2.3
v31.2.1
This is a minor release with small bug fixes and minor feature updates.
- Update SPDX license list to 3.18
- Improve how we discard license matches that are "gibberish"
- And new and improve existing license and license detection rules
What's Changed
- Prepare Release 31.1.1 by @pombredanne in #3085
- Process Gemfile.lock processing #3072 by @JonoYang in #3090
- Prefer using PKG-INFO from .egg-info in assemble #3083 by @JonoYang in #3091
- Correct purl type for cocoapods #3081 by @AyanSinhaMahapatra in #3096
- Fixed restructuredtext bulleted list to use * by @bwjohnson-ss in #3116
- Restore license texts of deprecated licenses by @pombredanne in #3101
- GitHub Workflows security hardening by @sashashura in #3117
- Update plugins docs and fix links by @AyanSinhaMahapatra in #3110
- Replace gemfileparser with gemfileparser2 by @JonoYang in #3098
- Yield package before assigning to resource by @JonoYang in #3115
- Fix summary holder bug by @AyanSinhaMahapatra in #3114
- Improve Author Detection by @AyanSinhaMahapatra in #3119
- Prepare release 31.2 by @pombredanne in #3104
New Contributors
- @bwjohnson-ss made their first contribution in #3116
- @sashashura made their first contribution in #3117
Full Changelog: v31.1.1...v31.2.1
v31.1.1
This is a minor release with a bug fix.
- Do not display tracing/debug outputs at runtime reported by @soimkim
Full Changelog: v31.1.0...v31.1.1
v31.1.0
v31.1.0 - 2022-08-29
This is a minor release with critical bug fixes and minor updates.
- Fix a critical bug in license detection
What's Changed
- Hot fix for license scan failure #3067 by @pombredanne in #3070
Full Changelog: v31.0.2...v31.1.0
v31.0.2
This is minor release with minor bug fixes and feature improvements.
What's Changed
- Improve license detection with rules and licenses by @pombredanne in #3030
- Fix issues in
PythonInstalledWheelMetadataFile.assign_package_to_resources()
by @JonoYang in #3062 - Add new and improved licenses and rules - summer 2022 by @pombredanne in #3064
- Prepare release 31.0.2 by @pombredanne in #3065
Full Changelog: v31.0.1...v31.0.2
v31.0.1
This is a major release with important bug and security fixes, new and improved
features and API changes.
Note that we no longer support Python 3.6. Use Python 3.7+ instead.
Important API changes:
-
The data structure of the JSON output has changed for copyrights, authors
and holders. We now use a proper name for attributes and not a generic "value". -
The data structure of the JSON output has changed for packages. We now
return "package_data" package information at the manifest file-level
rather than "packages". This has all the data attributes of a "package_data"
field plus others: "package_uuid", "package_data_files" and "files".-
There is a a new top-level "packages" attribute that contains package
instances that can be aggregating data from multiple manifests. -
There is a a new top-level "dependencies" attribute that contains each
dependency instance, these can be standalone or releated to a package.
These contain a new "extra_data" object. -
There is a new resource-level attribute "for_packages" which refers to
packages through package_uuids (pURL + uuid string).
-
-
The data structure for HTML output has been changed to include emails and
urls under the "infos" object. The HTML template displays output for holders,
authors, emails, and urls into separate tables like "licenses" and "copyrights". -
The data structure for CSV output has been changed to rename the Resource
column to "path". "copyright_holder" has been renamed to "holder". The CSV
output is deprecated and will be replaced in the future by an improved tabular
format. -
The license clarity scoring plugin has been overhauled to show new license
clarity criteria. More details of the new scoring criteria are provided below. -
The functionality of the summary plugin has been imprived to provide declared
origin and license information for the codebase being scanned. The previous
summary plugin functionality has been preserved in the newtallies
plugin.
More details are provided below. -
ScanCode has adopted the new code skeleton from https://github.com/nexB/skeleton
The key change is the location of the virtual environment. It used to be
created at the root of the scancode-toolkit directory. It is now created
under thevenv
subdirectory. You mus be aware of this if you use ScanCode
from a git clone -
DatafileHandler.assemble()
,DatafileHandler.assemble_from_many()
, and
the other.assemble()
methods from the other Package handlers from
packagedcode, have been updated to yield Package items before Dependency or
Resource items. This is particulary important in the case where we are calling
theassemble()
method outside of the scancode-toolkit context, where we
need to ensure that a Package exists before we assocate a Resource or
Dependency to it.
Copyright detection:
- The data structure in the JSON is now using consistently named attributes as
opposed to plain values. - Several copyright detection bugs have been fixed.
- French and German copyright detection is improved.
- Some spurious trailing dots in holders are not stripped.
License detection:
-
There have been significant license detection rules and licenses updates:
- 107 new licenses have been added (total is now 1954)
- 6780 new license detection rules have been added (total is now 32259)
- 6753 existing false positive license rules have been removed (see below).
- The SPDX license list has been updated to the latest v3.17
-
The rule attribute "only_known_words" has been renamed to "is_continuous" and its
meaning has been updated and expanded. A rule tagged as "is_continuous" can only
be matched if there are no gaps between matched words, be they stopwords, extra
unknown or known words. This improves several false positive license detections.
The processing for "is_continous" has been merged in "key phrases" processing
below. -
Key phrases can now be defined in a RULE text by surrounding one or more words
with double curly braces{{
and}}
. When defined a RULE will only match
when the key phrases match exactly. When all the text of rule is a "key phrase",
this is the same as being "is_continuous". -
The "--unknown-licenses" option now also detects unknown licenses using a
simple and effective ngrams-based matching in area that are not matched or
weakly matched. This helps detects things that look like a license but are not
yet known as licenses. -
False positive detection of "license lists" like the lists seen in license and
package management tools has been entirely reworked. Rather than using
thousands of small false positive rules, there is a new filter to detect a
long run of license references and tags that is typical of license lists.
As a results, thousands of rules have been replaced by a simpler filter, and
the license detection is more accurate, faster and has fewer false
positives. -
The new license flag "is_generic" tags licenses that are "generic" licenses
such as "other-permissive" or "other-copyleft". This is not yet
returned in the JSON API. -
When scanning binary files, the detection of single word rules is filtered when
surrounded by gibberish or mixed case. For instance$#%$GpL$
is a false
positive and is no longer reported. -
Several rules we tagged as is_license_notice incorrectly but were references
and have been requalified as is_license_reference. All rules made of a single
ord have been requalified as is_license_reference if they were not qualified
this way. -
Matches to small license rules (with small defined as under 15 words)
that are scattered over too many lines are now filtered as false matches. -
Small, two-words matches that overlap the previous or next match by
by the word "license" and assimilated are now filtered as false matches. -
The new --licenses-reference option adds a new "licenses_reference" top
level attribute to a scan when using the JSON and YAML outputs. This contains
all the details and the full text of every license seen in a file or
package license expression of a scan. This can be added added after the fact
using the --from-json option. -
New experimental support for non-English licenses. Use the command
./scancode --reindex-licenses-for-all-languages to index all known non-English
licenses and rules. From that point on, they will be detected. Because of this
some licenses that were not tagged with their languages are now correctly
tagged and they may not be detected unless you activate this new indexing
feature.
Package detection:
-
Major changes in package detection and reporting, codebase-level attribute
packages
with one or morepackage_data
and files for the packages are reported.
The specific changes made are:-
The resource level attribute
packages
has been renamed topackage_data
,
as these are really package data that are being detected, such as manifests,
lockfiles or other package data. This has the data attributes of apackage_data
field plus others:package_uuid
,package_data_files
andfiles
. -
A new top-level attribute
packages
has been added which contains package
instances created frompackage_data
detected in the codebase. -
A new codebase level attribute
dependencies
has been added which contains dependency
instances created from lockfiles detected in the codebase. -
The package attribute
root_path
has been deleted frompackage_data
in favour
of the new format where there is no root conceptually, just a list of files for each
package. -
There is a new resource-level attribute
for_packages
which refers to
packages through package_uids (pURL + uuid string). Apackage_adder
function is now used to associate a Package to a Resource that is part of
it. This gives us the flexibility to use the packagedcode Package handlers
in other contexts wherefor_packages
on Resource is not implemented in the
same way as scancode-toolkit. -
The package_data attribute
dependencies
(which is a list of DependentPackages),
now has a new attributeresolved_package
with a package data mapping.
Also therequirement
attribute is renamed toextracted_requirement
.
There is a newextra_data
to collect extra data as needed.
-
-
For Pypi packages, python_requires is treated as a package dependency.
License Clarity Scoring Update:
-
We are moving away from the original license clarity scoring designed for
ClearlyDefined in the license clarity score plugin. The previous license
clarity scoring logic produced a score that was misleading when it would
return a low score due to the stringent scoring criteria. We are now using
more general criteria to get a sense of what provenance information has been
provided and whether or not there is a conflict in licensing between what
licenses were declared at the top-level key files and what licenses have been
detected in the files under the top-level. -
The license clarity score is a value from 0-100 calculated by combining the
weighted values determined for each of the scoring elements:-
Declared license:
- When true, indicates that the software package licensing is documented at
top-level or well-known locations in the software project, typically in a
package manifest, NOTICE, LICENSE, COPYING or README file. - Scoring Weight = 40
- When true, indicates that the software package licensing is documented at
-
Identification precision:
- Indicates how well the license statement(s) of the software identify known
licenses that can be designated by precise keys (identifiers) as provided in
a...
- Indicates how well the license statement(s) of the software identify known
-
v31.0.0rc5
This is one of the last release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0rc3 in particular the ability to properly report licenses in system package scans.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc5/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed since 31 rc3
- Release 31 rc4 prep by @pombredanne in #3036
- Add package_adder argument to assemble() #3034 by @JonoYang in #3035
Full Changelog: v31.0.0rc3...v31.0.0rc5
v31.0.0rc3
This is a penultimate release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0rc2.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc3/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Do not fail without packages in cyclonedx #2987 by @AyanSinhaMahapatra in #3005
- Fix relaunching scancode on Apple silicon using Rosetta 2 emulation #2835 by @MarcelBochtler in #3018
- Clarify
unknown
license keys #2827 by @AyanSinhaMahapatra in #3023 - Yield Packages before other yieldables #3028 by @pombredanne in #3031
- Prepare Release 31.0.0rc3 by @pombredanne in #3029
New Contributors
- @MarcelBochtler made their first contribution in #3018
Full Changelog: v31.0.0rc2...v31.0.0rc3
v31.0.0rc2
This is a release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0rc1.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc2/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Improve npm package processing by @pombredanne in #2997
- Update license detection by @pombredanne in #2998
- Add new license rules and license - Early summer 2022 by @pombredanne in #2999
- Bump version to 31.0.0rc2 by @JonoYang in #3000
Full Changelog: v31.0.0rc1...v31.0.0rc2
v31.0.0rc1
This is a release candidate for the upcoming 31 release.
v31 is a major release with many new features, and several bug fixes and
improvements including major updates to the package and dependency collection and to the license detection.
Several bugs have been fixed when compared with 31.0.0b5.
See https://github.com/nexB/scancode-toolkit/blob/v31.0.0rc1/CHANGELOG.rst for an overview of the changes in v31 compared to v30.
Please try this release and report any installation issues so we can work towards a stable 31.
Thank you!
What's Changed
- Add black and isort as testing dependencies #2969 by @johnmhoran in #2970
- Rename precise_license_detection field #2967 by @JonoYang in #2968
- Convert package data dict to PackageData #2971 by @JonoYang in #2973
- Update extractcode --shallow option description by @lf32 in #2959
- Support shortcut flags for cli by @lf32 in #2951
- Consider only copyrights in summry #2972 by @JonoYang in #2974
- Reimplement get installed packages by @JonoYang in #2988
- Report extracted_requirement correctly by @TG1999 in #2984
- Improve packagecode and other release prep by @pombredanne in #2992
New Contributors
Full Changelog: v31.0.0b5...v31.0.0rc1