-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate SHAP feature importances #240
Comments
I would like to Add SHAP support for ELI5. I will start with the small issues to know eli5 better. If there is any other information then please let me know. |
Hi ELI5 community, @kmike @lopuhin , |
Hi @AshwinB-hat that's great!
Since SHAP does support Catboost, it would be awesome to have explain_prediction for catboost via SHAP. |
Thanks for the prompt reply @lopuhin! |
@lopuhin @kmike @ivanprado Hey, Regarding SHAP integration I had two doubts..
|
Hey @AshwinB-hat great questions! @kmike is the primary mentor for this project, but from my point of view:
|
As far as I have inferred, The second question stands cleared. I will look into optimization of (KNN) for SHAP or work around. Lime can be considered as a separate project. Regarding the first question. Implementing the SHAP features can be done but the issue is, we will have to do custom C++ implementations to optimise. I personally think the pain of extracting features will be bearable as opposed to reverse engineering the current shap and implementing it again. Your suggestions will be of great help here. @lopuhin @kmike @ivanprado |
@AshwinB-hat I'll be sure to read up more on different algorithms SHAP uses - but my idea is that the primary goal is to use SHAP library implementation mentioned in this section: https://github.com/slundberg/shap#tree-ensemble-example-with-treeexplainer-xgboostlightgbmcatboostscikit-learn-models (Fast C++ implementations are supported for XGBoost, LightGBM, CatBoost, and scikit-learn tree models), and we didn't think of doing any custom implementation in eli5 |
@lopuhin Sure. I will look into extracting the feature importance from the shap implementation. Thanks |
@AshwinB-hat indeed, you are right, I missed this. I would defer this to @kmike to clarify :) |
@lopuhin @kmike |
Hey @AshwinB-hat ! SHAP an interesting beast, because on a practical side, it combines both an algorithm which is LIME-like, and an algorithm which is treeinterpreter-like. They have somewhat different use cases, and different performance characteristics. Currently in eli5 explain_prediction for all decision tree algorithms use treeinterpreter-like algorithm. But SHAP is strictly better, so I think one of the goals should be to switch to SHAP for tree ensembles by default, using the current API. I'm unsure if we should be wrapping shap package or not. Things to consider:
Maybe we can even do both - have our own implementation of basics, when it is easy, e.g. when most work is done by a package we're explaining (e.g. check pred_contribs and pred_interactions arguments in XGBoost) - this would allow to avoid external dependency in simple cases. At the same time, I don't think we should be re-implementing the whole shap package, so integration with it for more advanced features (like plotting, or other explanation algorithms) is desired as well. |
Hey @kmike , Based on the shap values the further estimations are done. Im currently looking at the xgboost docs for their native implementation but I am unable to find any example or implementation that does not use the shap library. It would be helpful if you can link some resources. Also, I have written a mock draft for the GSOC proposal due tomorrow. I will be grateful if you can point out flaws and areas I can improve on. I have only considered wrapping the shap library so far as I could not find a better reason not to. Your suggestions will be valuable. Thanks . |
There is a recent paper which explains how to do explain_prediction for trees and tree ensembles, which they claim to be better than treeinterpreter-like measures: https://arxiv.org/pdf/1706.06060.pdf. It is already implemented for LightGBM (microsoft/LightGBM#825) and XGBoost (dmlc/xgboost#2438). There is also a repo with model-agnostic explanations: https://github.com/slundberg/shap.
The text was updated successfully, but these errors were encountered: