-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex Pipeline process & show_prediction #213
Comments
@armgilles wow, thanks for a great example - it would be great to try applying eli5 in this case. Could you post a complete notebook somewhere, if it's convenient to you? |
Sure, but I can't share data... I can use the classic |
@armgilles ah sorry, I thought it's based directly on the titanic tutorial. I thnk what you provided is already enough. |
Currently Pipeline support is not implemented for explain_prediction - it is implemented only for explain_weigths; that's the reason #15 is still open. Could you try passing
|
Nop it doesn't work :
Some informations :
|
Hey I created a notebook with Titanic dataset with this kind of pipeline. I use a specific function to build some features and then apply my pipeline process. If I can help for anything |
See GH-213 - this makes this example work, but I'm not sure if this is a correct thing to do.
Thanks for an example @armgilles ! Actually, this last error is related to how we handle pandas dataframes. Currently we assume that vectorizer is able to handle a list of inputs as it's input, but this is not correct in this case. A way to make your example work with current eli5 is to pass an already vectorized document:
There is also a way to make your original example work (a9ec021), but I'm not sure it's consistent with our API: currently we always advise to pass a single document, not a container of length 1. To be fair, passing |
I have a strange bug in this notebook, when I fit my model (simple xgboost, no pipeline). I predict a line with y=1 is wrong here (0.061 proba), it should be y=0 If I force To fix it I have to set Did I miss something ? I could open a new issue for better understanding. |
@armgilles currently y=1 is shown for binary classifiers in any case, but @kmike is working on this issue: #223 |
@armgilles if you have binary classification task with class names (e.g. "red' and "blue") it is not that bad - |
I'm trying a simple pipeline and import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
import eli5
fnames = ["sepal_length", "sepal_width", "petal_length", "petal_width",]
tnames = ["Setosa", "Versicolour", "Virginica"]
Xs, ys = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(Xs, ys, shuffle=True, test_size=0.2)
scaler = StandardScaler()
lr = LogisticRegression()
pipeline = make_pipeline( scaler, lr)
pipeline.fit(X_train, y_train)
random_sample = np.random.randint(len(X_test))
doc = X_test[random_sample]
eli5.show_prediction(pipeline, doc, feature_names=fnames, target_names=tnames) The error is,
I did the following to get it work,
Version: 0.8 |
Hi
I'm strangling to try to use
show_prediction
with a more complex pipeline and heterogeneous data... I know it is a pretty hot topic in Scikit & Eli5.I would like to use it like your exemple in Titanic Dataset but with more than one column with text
I try many things but I'm stuck here...
The text was updated successfully, but these errors were encountered: