Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] How to get full-ranked categorical data? #94

Closed
shivamkalra opened this issue Oct 5, 2016 · 1 comment
Closed

[question] How to get full-ranked categorical data? #94

shivamkalra opened this issue Oct 5, 2016 · 1 comment

Comments

@shivamkalra
Copy link

shivamkalra commented Oct 5, 2016

My data consists of four columns, lets say a, b, c, d. Where a and b is categorical data with 3 and 5 categories each. My formula is d = C(a) + C(b) + c and even tried d = 0 + C(a) + C(b) + c but none of them is giving me full-ranked data so that I've all 3 and 5 categories in my final input data. Is there way to force patsy to give me full-ranked data?

@njsmith
Copy link
Member

njsmith commented Oct 5, 2016

There isn't any way baked into patsy currently.

You could do it "by hand" by defining a custom constrast coding scheme -- see https://patsy.readthedocs.io/en/latest/categorical-coding.html

Something like C(a, np.eye(3)) for a variable with 3 levels would get you started. It wouldn't much more work to define a proper coding class that automatically handles different numbers of levels and gives you nicer column names -- search for MyTreat in the docs linked above for an example. And the next step beyond that would be to write up some tests and submit a pull request adding your coding class to patsy itself :-)

(Closing as a duplicate of #60.)

@njsmith njsmith closed this as completed Oct 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants