Skip to content

Commit

Permalink
Refactor License Scout
Browse files Browse the repository at this point in the history
This is a major refactor of License Scout that modifies it's operating
model to drive more configuration from external configuration managed
by YAML files rather than by configuration stored within the gem itself.

This allows teams to add exceptions, fallbacks, and other configuration
based on their needs without requiring us to release a new version of
the license_scout gem.

Signed-off-by: Tom Duffield <[email protected]>
  • Loading branch information
tduffield committed Apr 12, 2018
1 parent dbf7f01 commit c49a041
Show file tree
Hide file tree
Showing 273 changed files with 25,380 additions and 8,879 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
/tmp/
/spec/examples.txt
/license-cache/
*-dependency-licenses.json
7 changes: 4 additions & 3 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
AllCops:
TargetRubyVersion: 2.2
Excludes:
- spec/fixtures/**/*
Exclude:
- 'bundle/**/*'
- 'vendor/bundle/**/*'
- 'spec/fixtures/**/*'
1 change: 1 addition & 0 deletions .ruby-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2.5.0
9 changes: 6 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
language: ruby
sudo: false

cache: bundler

# do not run expensive spec tests on PRs, only on branches
branches:
only:
- master

before_install: gem install bundler
rvm:
- 2.2.6
before_install:
- source $HOME/otp/19.3/activate
- erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
- gem install bundler
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM devchef/chefdk
FROM chef/chefdk

COPY bin/ /usr/src/app/license_scout/bin/
COPY lib/ /usr/src/app/license_scout/lib/
Expand Down
193 changes: 174 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,188 @@
# license_scout
# License Scout

LicenseScout discovers and collects the licenses of a project and its
dependencies, including transitive dependencies.
License Scout is a utility that discovers and aggregates the licenses for your software project's transitive dependencies.

Currently supported project types are:
Currently supported Dependency Types and Dependency Managers are:

* Chef - Berkshelf
* Erlang - rebar
* Golang - godeps
* Javascript - npm
* Perl - CPAN
* Ruby - bundler
Dependency Type | Supported Dependency Managers
--- | ---
chef_cookbook | berkshelf
erlang | rebar
elixir | mix
golang | dep, godep, glide
habitat | habitat
nodejs | npm
perl | cpan
ruby | bundler

## Dependencies

* If you wish to scan for `berkshelf` dependencies, you'll need to manually install the Berkshelf gem in the same Ruby as License Scout
* If you wish to scan for `mix` or `rebar` dependencies, you'll need to install Erlang OTP 18.3 or greater.

## Usage

License Scout's default behavior is to scan the current directory and return a breakdown of all the licenses it can find.

```bash
$ bin/license_scout /dir/to/scout/successfully/
my_project $ license_scout

+------+------------+------------+---------+
| Type | Dependency | License(s) | Results |
+------+------------+------------+---------+
...
```

LicenseScout will exit `0` if it was able to find licenses for all your dependencies. Otherwise, it will exit `1`.

Under the covers, License Scout leverages [Licensee](http://ben.balter.com/licensee/) (the same Ruby Gem [GitHub](https://developer.github.com/v3/licenses/) uses to detect [OSS licenses](https://spdx.org/licenses/)). In addition to using Licensee to scan your source code for licenses, License Scout will go a step further and attempt to determine if the metadata provided by the Dependency Manager specifies which license each dependency uses. At the end of the process, License Scout will provide you a Dependency Manifest following information:

1. The name of the license(s) (the SPDX ID if the a recognized open source license).
2. The name of the file where the License Scout found the license.
3. The contents of the license file (if available).

In addition to the printout provided to STDOUT, License Scout will also save a JSON manifest of all your dependencies to disk.

```json
{
"license_manifest_version": 2,
"generated_on": "<DATE>",
"name": "<YOUR_PROJECT>",
"dependencies": [...]
}
```

For more information about the structure of JSON manifest, please check out the full [JSON Schema](lib/license_scout/data/dependency_manifest_v2_schema.json).

### Result Types

License Scout will provide a summary of the licenses it finds to STDOUT. These results are intended to provide direction as to which actions may or may not be necessary to generate a Dependency Manifest that meets all of your compliance requirements. To do this it categorizes its findings into the following results.

Result | Description
--- | ---
Flagged | License Scout was able to determine the license for this software dependency, and it is one of the licenses you have explicitly flagged. You should either remove the dependency or [add an Exception](#dependency-exceptions).
Missing | License Scout could not find any license files or license metadata associated with this dependency. You should contact the maintainer and/or specify a [Fallback License](#fallback-licenses).
Unpermitted | License Scout was able to determine the license for this software dependency, but it is not one of the licenses you have explicitly allowed. You should either remove the dependency or [add an Exception](#dependency-exceptions).
OK | There were no issues.
Undetermined | License Scout found a license file but was unable to determine (with sufficient confidence) what license that file represents. License Scout was also unable to determine the license using Dependency Manager metadata. You should contact the maintainer and/or specify a [Fallback License](#fallback-licenses).

## Advanced Usage

### Configuration File(s)

You can control License Scout's behavior by providing one or more YAML configuration files, available either locally or via HTTP, to the `--config-files` option of the CLI.

```bash
$ license_scout --config-files http://example.com/license_scout/common.yml,./.license_scout.yml
```

License Scout evalutes these files in the order they are provided, allowing you to hydrate configuration by composing multiple files together. For example, you can have a single organization-wide configuration file that specifies what licenses are allowed and project-specific configuration file that specifies exceptions and which directories to scan.

#### How multiple configuration files are handled

License Scout uses [mixlib-config](https://github.com/chef/mixlib-config) to handle it's configuration. When loading multiple configuration files, mixlib-config (and thus License Scout) will not perform deep merges of Arrays. That means that License Scout will not merge (for example) `allowed_licenses` (or `flagged_licenses`) from two different configuration files; it will only take the `allowed_licenses` value from the configuration that is loaded last. This logic does not apply to the `fallbacks` or `exceptions`, because those are defined as [`config_contexts`](lib/license_scout/config.rb). It **does** apply to the individuals types specified within the `fallbacks` or `exceptions` however.

### Allowed and Flagged Licenses

$ bin/license_scout /dir/to/scout/unsuccessfully/
Dependency 'gopkg.in_yaml.v2' version '53feefa2559fb8dfa8d81baad31be332c97d6c77' under 'go_godep' is missing license information.
>> Found 41 dependencies for go_godep. 40 OK, 1 with problems
License Scout provides you with the ability to provide a list of licenses that are explicitly allowed, or a list of licenses that should be flagged for further scrutiny.

- When you specify a list of `allowed_licenses`, License Scout will exit `1` if it detects a dependency with a license other than one on the list.
- When you specify a list of `flagged_licenses`, License Scout will exit `1` if it finds a dependency with that license.

To add a license to the list of allowed or flagged licenses, you need only provide the array of licenses as a string in your configuration file. A configuration may have a list of allowed licenses _or_ flagged licenses, it cannot have both. _License Scout does not support regular expressions or glob-patterms for `allowed_licenses` or `flagged_licenses`._

```yaml
allowed_licenses:
- Apache-2.0

# OR

flagged_licenses:
- Apache-2.0
```
Detailed instructions for fixing licensing failures found by license_scout are now provided in the script's output. See [bin/license_scout](bin/license_scout) for more details.
License Scout will compare these string values to the licenses it finds within the dependencies. License Scout does its best to resolve everything down to valid [SPDX IDs](https://spdx.org/licenses/), so you should specify licenses using their SDPX ID.
> _Warning: Because we cannot control how maintainers specify licenses in their metadata, there may be a situation where License Scout cannot correctly detect the intended SPDX ID. In this case, you may need to temporarily provide a temporary Fallback License in your configuration. If you encounter this situation, we encourage you to [open an Issue](https://github.com/chef/license_scout) with us._
### Dependency Exceptions
If you specify a list of allowed or flagged licenses, there may be a dependency that does not adhere to the specified license(s) for which you wish to make an exception. License Scout allows you to specify Exceptions to these lsits as part of your Configuration File.
```yaml
---
allowed_licenses:
- Apache-2.0

exceptions:
ruby:
- name: bundler
reason: Used only during .gem creation
- name: json (1.8.3)
```
Exceptions are organized by `type` (e.g. `ruby` - see Table above). There are two elements to each exception: a `name` and a `reason`.

Property | Description
--- | ---
`name` | Can be specified by `dep-name` or `dep-name (dep-version)` where `dep-name` is the name of the dependency as it exists in the Dependency Manifest and `dep-version` can be a traditional version, git reference, or type-specific version specification such as `$pkg_version-$pkg_release` for Habitat.
`reason` | An optional string that will be included in the Dependency Manifest for documentation purposes.

Simple glob-style pattern matching _is_ supported for Exceptions, so you can have an Exception for a large collection of dependencies without enumerating them all.

```yaml
---
exceptions:
chef_cookbook:
- name: apache2 (5.*)
reason: Allowed by TICKET-001
habitat:
- name: core/bundler (1.15.1-*)
reason: Only used for .gem creation
ruby:
- name: aws-sdk-*
reason: Exception granted by Bobo T. Clown on 2018/02/31
```

### Fallback Licenses

In situations where License Scout is unable to determine the license for a particular dependency, either because Licensee was not able to identify any of the license files or the Dependency Manager did not provide any metadata that incidated how the dependency was licensed, you'll need to provide a Fallback License in your configuration. Like Exceptions, Fallback Licenses are grouped by `type`.

```yaml
fallbacks:
golang:
- name: github.com/dchest/siphash
license_id: CC0-1.0
license_content: https://raw.githubusercontent.com/dchest/siphash/master/README.md
```

Property | Description
--- | ---
name | The name of the dependency as it appears in the JSON manifest.
license_id | The ID of the license as it appears in the JSON manifest.
license_content | A URL to a file where the raw text of the license can be downloaded.

In addition to including any files Licensee identified as potential license files (but couldn't identify), License Scout will also include the Fallback License you specified in the Dependency Manifest.

## Configuration

Value | Description | Default
--- | --- | ---
directories | The fully-qualified local paths to the directories you wish to scan | _The current working directory._ |
name | The name you want to give to the scan result. | _The basename of the first directory to be scanned._ |
output_directory | The path to the directory where the output JSON file should be saved. | _The current working directory._ |
log_level | What log information should be included in STDOUT | `info` |
allowed_licenses | Only allow dependencies to have these licenses. | `[]` |
flagged_licenses | An array of licenses that should be flagged for removal or exception. | `[]` |
exceptions | An array of Exceptions. | `[]` |
environment | A hash of additional Environment Variables to pass to [mixlib-shellout](https://github.com/chef/mixlib-shellout) | `{}` |
escript_bin | The path to the `escript` binary you wish to use when shelling out to Erlang. | `escript` |
ruby_bin | The path to the `ruby` binary you wish to use when shelling out to Ruby. | `ruby` |
cpanm_root | The path to where the cpanminus install cache is located. | `~/.cpanm` |

## Contributing

This project is maintained by the contribution guidelines identified for
[chef](https://github.com/chef/chef) project. You can find the guidelines here:
This project is maintained by the contribution guidelines identified for [chef](https://github.com/chef/chef) project. You can find the guidelines here:

https://github.com/chef/chef/blob/master/CONTRIBUTING.md

Expand All @@ -36,5 +191,5 @@ Pull requests in this project are merged when they have two :+1:s from maintaine
## Maintainers

- [Dan DeLeo](https://github.com/danielsdeleo)
- [Serdar Sutay](https://github.com/sersut)
- [Ryan Cragun](https://github.com/ryancragun)
- [Tom Duffield](https://github.com/tduffield)

9 changes: 8 additions & 1 deletion Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

require "bundler/gem_tasks"
require "rspec/core/rake_task"
require "open-uri"

task default: :test

Expand All @@ -36,4 +37,10 @@ rescue LoadError
end

desc "Run all tests"
task test: [:spec]
task test: [:spec, :style]

desc "Refresh the SPDX JSON database"
task :spdx do
IO.copy_stream(open("https://spdx.org/licenses/licenses.json"), File.expand_path("./lib/license_scout/data/licenses.json"))
IO.copy_stream(open("https://spdx.org/licenses/exceptions.json"), File.expand_path("./lib/license_scout/data/exceptions.json"))
end
26 changes: 21 additions & 5 deletions appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
os: Windows Server 2012 R2
os: Visual Studio 2017
platform:
- x64

Expand All @@ -7,13 +7,29 @@ branches:
only:
- master

cache:
- bundle

environment:
matrix:
- ruby_version: "25-x64"

install:
- set PATH=C:\Ruby22\bin;%PATH%
- systeminfo
- winrm quickconfig -q
- SET PATH=C:\Ruby%ruby_version%\bin;%PATH%
- echo %PATH%
- appveyor DownloadFile http://curl.haxx.se/ca/cacert.pem -FileName C:\cacert.pem
- set SSL_CERT_FILE=C:\cacert.pem

build_script:
- bundle install || bundle install || bundle install

build: off

before_test:
- ruby --version
- gem --version
- bundler --version
- bundle env

test_script:
- bundle exec rake spec
- bundle exec rake
60 changes: 2 additions & 58 deletions bin/license_scout
Original file line number Diff line number Diff line change
Expand Up @@ -18,62 +18,6 @@

$:.unshift File.expand_path("../../lib", __FILE__)

require "license_scout/collector"
require "license_scout/overrides"
require "license_scout/options"
require "license_scout"

project_dir = ARGV[0] || File.expand_path(Dir.pwd)
project_name = File.basename(project_dir)

# Create the output files under a specific directory in order not to pollute the
# project_dir too much.
output_dir = File.join(project_dir, "license-cache")

overrides = LicenseScout::Overrides.new

opts = LicenseScout::Options.new(overrides: overrides)

collector = LicenseScout::Collector.new(project_name, project_dir, output_dir, opts)

collector.run
report = collector.issue_report

unless report.empty?
puts report

puts <<-EXPLANATION
How to fix this depends on what information license_scout was unable to
determine:
* If the package is missing license information, that means license_scout was
unable to determine which license the package was released under. Depending
on the package manager, this is usually specified in the package's metadata,
for example, in the gemspec file for rubygems or in the package.json for npm.
If you know which license a package was released under, MIT for example, you
can add an override in license_scout's overrides.rb file in the section for
the appropriate package manager like this:
["package-name", "MIT", nil]
* If the package is missing the license file, that means license_scout could not
find the license text in any of the places the license is typically found, for
example, in a file named LICENSE in the root of the package. If the package
includes the license text in a non standard location or in its source repo,
you can indicate this by adding an override in license_scout's overrides.rb
file in the section for the appropriate package manager like this:
["package-name", nil, ["https://example.com/foocorp/package-name/master/LICENSE"]],
If you know that the package was released under one of the common software
licenses, MIT for example, but does not include the license text in packaged
releases or in its source repo, you can add an override in license_scout's
overrides.rb file in the section for the appropriate package manager like
this:
["package-name", nil, [canonical("MIT")]]
See the closed pull requests on the license_scout repo for examples of how to
do this:
https://github.com/chef/license_scout/pulls?q=is%3Apr+is%3Aclosed
EXPLANATION

exit 2
end
LicenseScout::CLI.new.run
Binary file added bin/mix_lock_json
Binary file not shown.
Binary file modified bin/rebar_lock_json
Binary file not shown.
2 changes: 2 additions & 0 deletions lib/license_scout.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,5 @@

module LicenseScout
end

require "license_scout/cli"
Loading

0 comments on commit c49a041

Please sign in to comment.