Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: kernelshap
Title: Kernel SHAP
Version: 0.9.0
Version: 1.0.0
Authors@R: c(
person("Michael", "Mayer", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0009-0007-2540-9629")),
Expand All @@ -19,18 +19,18 @@ Description: Efficient implementation of Kernel SHAP (Lundberg and Lee,
It supports multi-output models, case weights, and parallel
computations. Visualizations can be done using the R package
'shapviz'.
License: GPL (>= 2)
License: GPL (>= 3)
Depends:
R (>= 3.2.0)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Imports:
doFuture,
foreach,
Imports:
future.apply,
stats,
utils
Suggests:
progressr,
testthat (>= 3.0.0)
Config/testthat/edition: 3
URL: https://github.com/ModelOriented/kernelshap
Expand Down
881 changes: 570 additions & 311 deletions LICENSE.md

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,3 @@ export(additive_shap)
export(is.kernelshap)
export(kernelshap)
export(permshap)
importFrom(doFuture,"%dofuture%")
45 changes: 23 additions & 22 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# kernelshap 0.9.0
# kernelshap 1.0.0

### Bug fix

Expand All @@ -12,41 +12,42 @@ we have fixed a bug in how `kernelshap()` calculates Kernel weights.
Fixed in [#168](https://github.com/ModelOriented/kernelshap/pull/168), which also has received
unit tests against Python's "shap".

### API
### Major improvements for parallel computation

We have switched from `%dopar%` to `future_lapply()`:

- We have a progress bar! Activate via `progressr::handlers(global = TRUE).
- No need for calling `registerDoFuture()` anymore.
- No need to set `parallel = TRUE` anymore.
- No need to declare missing packages anymore via `parallel_args`. Should you still encounter missing packages anyway,
use the new argument `future.packages`.
- Random seeding is now properly handled, thanks [#163](https://github.com/ModelOriented/kernelshap/issues/163) for reporting.

Implemented in PRs [#170](https://github.com/ModelOriented/kernelshap/pull/170) and [#173](https://github.com/ModelOriented/kernelshap/pull/173).

### API changes and improvements

- The traditional progress bar has been replaced by {progressr}. Activate e.g. via `progressr::handlers(global = TRUE).
It works in parallel mode as well, and you can choose your own style ([#173](https://github.com/ModelOriented/kernelshap/pull/173)).
- The argument `feature_names` can now also be used with matrix input ([#166](https://github.com/ModelOriented/kernelshap/pull/166)).
- `kernelshap()` and `permshap()` have received a `seed = NULL` argument ([#170](https://github.com/ModelOriented/kernelshap/pull/170)).
- Parallel mode: If missing packages or globals have to be specified, this now has to be done through `parallel_args = list(packages = ..., globals = ...)`
instead of `parallel_args = list(.packages = ..., .globals = ...)`, see section on parallelism below.
The list is passed to `[foreach::foreach(.options.future = ...)]`.

### Speed and memory improvements
### Memory improvements

- `permshap()` and `kernelshap()` require about 10% less memory ([#166](https://github.com/ModelOriented/kernelshap/pull/166)).
- `permshap()` and `kernelshap()` are faster for data.frame input,
and slightly slower for matrix input ([#166](https://github.com/ModelOriented/kernelshap/pull/166)).
- Additionally, `permshap(, exact = TRUE)` is faster by pre-calculating more
elements used across rows ([#165](https://github.com/ModelOriented/kernelshap/pull/165)).

### Internal changes

- Matrices holding on-off vectors are now consistently of type logical ([#167](https://github.com/ModelOriented/kernelshap/pull/167)).
- `kernelshap()` solver: Replacing the Moore-Penrose pseudo-inverse by two direct solves, a trick of [Ian Covert](https://github.com/iancovert/shapley-regression/blob/master/shapreg/shapley.py),
and ported to R in ([#171](https://github.com/ModelOriented/kernelshap/pull/171)).

### Changes in parallelism

We have switched from `%dopar%` to `doFuture` ([#170](https://github.com/ModelOriented/kernelshap/pull/170)) with the following impact:

- No need for calling `registerDoFuture()` anymore.
- Random seeding is properly handled, and respects `seed`, thanks [#163](https://github.com/ModelOriented/kernelshap/issues/163) for reporting.
- If missing packages or globals have to be specified, this now has to be done through `parallel_args = list(packages = ..., globals = ...)`
instead of `parallel_args = list(.packages = ..., .globals = ...)`. The list is passed to `[foreach::foreach(.options.future = ...)]`.

### Dependencies
### Change in dependencies

- {MASS}: Dropped from imports
- {doFuture}: suggests -> imports
- {MASS}: Dropped from Imports
- {foreach}: Dropped from Imports
- {doFuture}: Dropped from suggests and replaced by {future.apply} in Imports
- {progressr}: New in Suggests

# kernelshap 0.8.0

Expand Down
104 changes: 46 additions & 58 deletions R/kernelshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
#' Otherwise, a partly exact hybrid algorithm combining exact calculations and
#' iterative paired sampling is used, see Details.
#'
#' To activate the progress bar, e.g., run `progressr::handlers(global = TRUE)` first.
#'
#' To activate parallel processing, run `future::plan(multisession)` or similar.
#' To deactivate later, run `plan("sequential")`.
#'
#' @details
#' The pure iterative Kernel SHAP sampling as in Covert and Lee (2021) works like this:
#'
Expand Down Expand Up @@ -53,8 +58,6 @@
#' should not be higher than 10 for exact calculations.
#' For similar reasons, degree 2 hybrids should not use \eqn{p} larger than 40.
#'
#' @importFrom doFuture %dofuture%
#'
#' @param object Fitted model object.
#' @param X \eqn{(n \times p)} matrix or `data.frame` with rows to be explained.
#' The columns should only represent model features, not the response
Expand Down Expand Up @@ -104,16 +107,10 @@
#' For `permshap()`, the default is 0.01, while for `kernelshap()` it is set to 0.005.
#' @param max_iter If the stopping criterion (see `tol`) is not reached after
#' `max_iter` iterations, the algorithm stops. Ignored if `exact = TRUE`.
#' @param parallel If `TRUE`, use [foreach::foreach()] and `%dofuture%` to loop over rows
#' to be explained. Must register backend beforehand, e.g., `plan(multisession)`,
#' see README for an example. Currently disables the progress bar.
#' @param parallel_args Named list of arguments passed to
#' `foreach::foreach(.options.future = ...)`, ideally `NULL` (default).
#' Only relevant if `parallel = TRUE`.
#' Example on Windows: if `object` is a GAM fitted with package 'mgcv',
#' then one might need to set `parallel_args = list(packages = "mgcv")`.
#' Similarly, if the model has been fitted with `ranger()`, then it might be necessary
#' to pass `parallel_args = list(packages = "ranger")`.
#' @param parallel Deprecated.
#' @param parallel_args Deprecated (see `future.packages`).
#' @param future.packages Character vector with packages to be attached in the
#' R environment evaluating the future. Only if parallel computing.
#' @param verbose Set to `FALSE` to suppress messages and the progress bar.
#' @param seed Optional integer random seed. Note that it changes the global seed.
#' @param survival Should cumulative hazards ("chf", default) or survival
Expand Down Expand Up @@ -195,9 +192,19 @@ kernelshap.default <- function(
max_iter = 100L,
parallel = FALSE,
parallel_args = NULL,
future.packages = NULL,
verbose = TRUE,
seed = NULL,
...) {
if (parallel) {
warning("The 'parallel' argument has been deprecated. Simply set plan(...) to activate parallel computing.")
}
if (!is.null(parallel_args)) {
warning(
"The 'parallel_args' argument has been deprecated.
If your parallel sessions lack a package, use argument `future.packages`."
)
}
p <- length(feature_names)
basic_checks(X = X, feature_names = feature_names, pred_fun = pred_fun)
stopifnot(
Expand Down Expand Up @@ -258,54 +265,35 @@ kernelshap.default <- function(
warning_burden(max(m, m_exact), bg_n = bg_n)
}

# Apply Kernel SHAP to each row of X
if (isTRUE(parallel)) {
future_args <- c(list(seed = TRUE), parallel_args)
parallel_args <- c(list(i = seq_len(n)), list(.options.future = future_args))
res <- do.call(foreach::foreach, parallel_args) %dofuture% kernelshap_one(
x = X[i, , drop = FALSE],
v1 = v1[i, , drop = FALSE],
object = object,
pred_fun = pred_fun,
feature_names = feature_names,
bg_w = bg_w,
exact = exact,
deg = hybrid_degree,
m = m,
tol = tol,
max_iter = max_iter,
v0 = v0,
precalc = precalc,
...
)
} else {
if (verbose && n >= 2L) {
pb <- utils::txtProgressBar(max = n, style = 3)
}
res <- vector("list", n)
for (i in seq_len(n)) {
res[[i]] <- kernelshap_one(
x = X[i, , drop = FALSE],
v1 = v1[i, , drop = FALSE],
object = object,
pred_fun = pred_fun,
feature_names = feature_names,
bg_w = bg_w,
exact = exact,
deg = hybrid_degree,
m = m,
tol = tol,
max_iter = max_iter,
v0 = v0,
precalc = precalc,
...
)
if (verbose && n >= 2L) {
utils::setTxtProgressBar(pb, i)
}
}
pbar_step <- max(1L, n %/% 20L)
pbar <- if (verbose && n >= 2L && requireNamespace("progressr", quietly = TRUE)) {
progressr::progressor(ceiling(n / pbar_step))
}

# Apply Kernel SHAP to each row of X
res <- future.apply::future_lapply(
seq_len(n),
FUN = kernelshap_one,
future.packages = future.packages,
future.seed = TRUE,
x = X,
v1 = v1,
object = object,
pred_fun = pred_fun,
feature_names = feature_names,
bg_w = bg_w,
exact = exact,
deg = hybrid_degree,
m = m,
tol = tol,
max_iter = max_iter,
v0 = v0,
precalc = precalc,
pbar = pbar,
pbar_step = pbar_step,
...
)

# Organize output
exact <- exact || trunc(p / 2) == hybrid_degree

Expand Down
85 changes: 41 additions & 44 deletions R/permshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@
#' Otherwise, the sampling process iterates until the resulting values
#' are sufficiently precise, and standard errors are provided.
#'
#' To activate the progress bar, e.g., run `progressr::handlers(global = TRUE)` first.
#'
#' To activate parallel processing, run `future::plan(multisession)` or similar.
#' To deactivate later, run `plan("sequential")`.
#'
#' @details
#' During each iteration, the algorithm cycles twice through a random permutation:
#' It starts with all feature components "turned on" (i.e., taking them
Expand Down Expand Up @@ -105,9 +110,19 @@ permshap.default <- function(
max_iter = 10L * length(feature_names),
parallel = FALSE,
parallel_args = NULL,
future.packages = NULL,
verbose = TRUE,
seed = NULL,
...) {
if (parallel) {
warning("The 'parallel' argument has been deprecated. Simply set plan(...) to activate parallel computing.")
}
if (!is.null(parallel_args)) {
warning(
"The 'parallel_args' argument has been deprecated.
If your parallel sessions lack a package, use argument `future.packages`."
)
}
p <- length(feature_names)
if (p <= 1L) {
stop("Case p = 1 not implemented. Use kernelshap() instead.")
Expand Down Expand Up @@ -167,52 +182,34 @@ permshap.default <- function(
warning_burden(max(m_eval, m_exact), bg_n = bg_n)
}

# Apply permutation SHAP to each row of X
if (isTRUE(parallel)) {
future_args <- c(list(seed = TRUE), parallel_args)
parallel_args <- c(list(i = seq_len(n)), list(.options.future = future_args))
res <- do.call(foreach::foreach, parallel_args) %dofuture% permshap_one(
x = X[i, , drop = FALSE],
v1 = v1[i, , drop = FALSE],
object = object,
pred_fun = pred_fun,
bg_w = bg_w,
v0 = v0,
precalc = precalc,
feature_names = feature_names,
exact = exact,
low_memory = low_memory,
tol = tol,
max_iter = max_iter,
...
)
} else {
if (verbose && n >= 2L) {
pb <- utils::txtProgressBar(max = n, style = 3)
}
res <- vector("list", n)
for (i in seq_len(n)) {
res[[i]] <- permshap_one(
x = X[i, , drop = FALSE],
v1 = v1[i, , drop = FALSE],
object = object,
pred_fun = pred_fun,
bg_w = bg_w,
v0 = v0,
precalc = precalc,
feature_names = feature_names,
exact = exact,
low_memory = low_memory,
tol = tol,
max_iter = max_iter,
...
)
if (verbose && n >= 2L) {
utils::setTxtProgressBar(pb, i)
}
}
pbar_step <- max(1L, n %/% 20L)
pbar <- if (verbose && n >= 2L && requireNamespace("progressr", quietly = TRUE)) {
progressr::progressor(ceiling(n / pbar_step))
}

# Apply permutation SHAP to each row of X
res <- future.apply::future_lapply(
seq_len(n),
FUN = permshap_one,
future.packages = future.packages,
future.seed = TRUE,
x = X,
v1 = v1,
object = object,
pred_fun = pred_fun,
bg_w = bg_w,
v0 = v0,
precalc = precalc,
feature_names = feature_names,
exact = exact,
low_memory = low_memory,
tol = tol,
max_iter = max_iter,
pbar = pbar,
pbar_step = pbar_step,
...
)

# Organize output
out <- list(
S = reorganize_list(lapply(res, `[[`, "beta")),
Expand Down
Loading
Loading