-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional removal of redundant columns #42
Comments
The reason patsy doesn't do this is just that I'm not sure what the right API approach is :-) If you have a "regularized regression" model that takes a formula and always wants to have redundancy removal eliminated, then you want something like an extra option to If you want to be able to turn this on-and-off based on your whims as a user, then you want something inside the formula itself to determine the behaviour. (An idea I've played around with in the past is to have formulas like The very simple suggestion in #60 would at least make this possible, albeit awkward. (You'd have to explicitly override patsy's default redundancy removal on a factor-by-factor basis, CC @josef-pkt |
I'd like this.
Yes.
That sounds good too. Patsy's great, thanks! |
In statsmodels most regularization and constraints will be optionally on only some terms, e.g. in GAM we only penalize the splines. So, from this it would be easier to directly control the factor codings in the formula itself. (However, it should be easy to fix with a small global penalty if the option affects all terms in a formula). About removing all constant effects: Another option would be to introduce something like |
On the question of specifying this stuff in the formula vs. as an argument to Really, both seem helpful. If there are good ways to specify in the formula, I imagine most direct users of patsy would do that, but it's good for libraries using patsy to be able to set up a different default behavior for their problem/domain/context using an argument to |
Another use case for this is in a "predict" function where you are taking a fitted model and using it to predict at a new set of points. Since no model is being fit, there is no need for the design matrix to be nonsingular. I have had a lot of trouble with this when doing predictions with formulas in statsmodels. |
@kshedden: you mean, you have some model that you fit without using Patsy, with some sort of redundant coding scheme, and now you want to use Patsy for doing predictions, so you need to convince Patsy to match whatever thing was done originally? |
Sorry for the noise, I was using a modified version of patsy and confused myself, all is fine. |
Patsy automatically remove redundant columns (linearly dependent) so that the final matrix is not overdetermined. is there an option to turn off the removal? I would like to use patsy formulas for regularized linear regression and for that I need all the columns, even if they repeat.
The text was updated successfully, but these errors were encountered: