-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transforms like 'scale' need some way to handle missing data #82
Comments
FYI -- to paste multi-line code blocks on github, use triple-backquotes. (I just fixed your original post -- if you click "edit" on it you can see how I modified it.) The main problem you are hitting here is that your In [11]: df["X"].diff()
Out[11]:
0 NaN
1 3
2 -4
3 3
4 -2
5 3
6 -4
7 2
Name: X, dtype: float64 Then when you pass that to In [12]: pt.builtins.scale(df["X"].diff())
Out[12]:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
Name: X, dtype: float128 And then patsy's missing-data handling kicks in and throws away all of these NaNs, you get back a design matrix with zero rows in it. And then there's a bug in patsy which I should fix, where if you try to print a design matrix with zero rows then it throws an error. But that's not really your main problem, it just obscures it :-) |
Discovered by @rsgmon in pydatagh-82.
The deeper issue, which is a genuine issue, is that |
Thanks Nathaniel for the edit tip and explanation of the underlying issue. I can find a work around for it now that you've explained it. |
The text was updated successfully, but these errors were encountered: