Simpler term names #19

jtornero · 2013-05-24T11:49:59Z

First I want to thank the developer team for their excellent work.

Well, I feel that the "predictors" names when using for instaqnce C( predictor, Treatment(5)) ar too long and somehow confusing. When you make interactions between predictors, you get things like:

C(trimcod, Treatment(4))[T.3]:C(flota, Treatment(11))[T.10]

It would be nice to be able to assign an alias or just forget all the stuff apart from the predictor name to get something like:

[trimcod][T.3]

or just [trimcod 3]

I've playng with the MyTreat example but I can't get any positive results

Thank you very much

Jorge Tornero

njsmith · 2013-05-24T12:58:23Z

There are two parts to the name -- the "C(trimcod, Treatment(4))" is the
literal Python code that was executed to get the variable, and the "[T.3]"
part is added on by the categorical variable coder.

There really isn't any way to pull out "trimcod" from "C(trimcod,
Treatment(4))", because that would require parsing Python source code...
note that C() and Treatment() are just regular Python functions.

In the short run, you can store the output of C() to a temporary variable
with whatever name you want, and use that in your formula:

Ctrimcod = patsy.C(trimcod, patsy.Treatment(4))
Cflota = patsy.C(flota, patsy.Treatment(4))
lm("y ~ Ctrimcod * Cfloat", ...)

But Ctrimcod and Cflota will be strange opaque objects that you can't do
much else with, so you'll want to keep the original variables around as
well.

The real solution in the long run will be to implement a proper data type
in Python for storing categorical data, and which can have default coding
options attached to it -- basically turning the output of C() into an
object that's actually useful. That's how this stuff works in R -- if you
store your data as a "factor" object, you can attach the equivalent of
Treatment(4) to it directly. But this will take a while, since it needs
enhancements in numpy, in pandas, etc.

On Fri, May 24, 2013 at 12:50 PM, jtornero [email protected] wrote:

First I want to thank the developer team for their excellent work.

Well, I feel that the "predictors" names when using for instaqnce C(
predictor, Treatment(5)) ar too long and somehow confusing. When you make
interactions between predictors, you get things like:

C(trimcod, Treatment(4))[T.3]:C(flota, Treatment(11))[T.10]

It would be nice to be able to assign an alias or just forget all the
stuff apart from the predictor name to get something like:

[trimcod][T.3]

or just [trimcod 3]

I've playng with the MyTreat example but I can't get any positive results

Thank you very much

Jorge Tornero

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/19
.

jtornero · 2013-05-24T13:11:04Z

Thank you very much for your fast answer.

I've made a workaround forcing the dmatrices (or dmatrix) output to pandas dataframe and substituing in dataframe.columns, i.e., say predictors is our dataframe:

newcol=predictors.rename(columns=lambda x: str.replace(x,'C(anocod, Treatment(23))','ANOCOD'))
newcol=newcol.rename(columns=lambda x: str.replace(x,'C(trimcod, Treatment(4))','TRIMCOD'))
newcol=newcol.rename(columns=lambda x: str.replace(x,'C(flota, Treatment(11))','FLEETYPE'))

and then

predictors.columns=newcol

Cheers

Jorge Tornero

njsmith · 2013-05-24T18:26:58Z

Looks like a good workaround. The only thing to watch out for here is that
the .design_info attribute on your design matrix will still have the old
term names in it. Right now that probably won't affect anything, but
someday statsmodels and friends will probably be smart enough to use that
metadata for various things, so keep an eye out for that.

(This comment is partly directed at people who google this thread years
from now.)

On Fri, May 24, 2013 at 2:11 PM, jtornero [email protected] wrote:

Thank you very much for your fast answer.

I've made a workaround forcing the dmatrices (or dmatrix) output to pandas
dataframe and substituing in dataframe.columns, i.e., say predictors is our
dataframe:

newcol=predictors.rename(columns=lambda x: str.replace(x,'C(anocod,
Treatment(23))','ANOCOD'))
newcol=newcol.rename(columns=lambda x: str.replace(x,'C(trimcod,
Treatment(4))','TRIMCOD'))
newcol=newcol.rename(columns=lambda x: str.replace(x,'C(flota,
Treatment(11))','FLEETYPE'))

and then

predictors.columns=newcol

Cheers

Jorge Tornero

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/19#issuecomment-18403778
.

jtornero · 2013-05-28T08:21:07Z

Well, I've been playing a little with git and I've been able to modify the source code in my local repo. The idea is to be able to pass an additional parameter to Treatment, say display_name that overrides, if provided, the default name for Transform, i.e., [T.1] to whatever you want. It looks very nice, but I don't know how to separate the "very first part' from the "Treatment" part. I guessed that it should be contained somewhere in the .design_info variables, but I haven't been able to find it. Any suggestions to proceed?

I see that modifying the source is somehow a step further and more dangerous that implement the recipe for MyTreat in the documentation, but I haven't been able to make that sort of constructions, sorry.

Thank you very much

Jorge Tornero

njsmith · 2013-06-10T19:43:10Z

I'm afraid I don't really understand what you're asking. A coding class like Treatment gets to set the [T.1] part to be whatever it wants. You can easily do that with a custom class like MyTreat too, though, the built-in classes like Treatment just use the same APIs that you can use yourself in a custom class. You don't have to separate the "very first part" from the [T.1] part, because you can only affect the [T.1] part. The rest comes from the factor's .name (https://patsy.readthedocs.org/en/latest/expert-model-specification.html#patsy.factor_protocol.name). The .design_info variables don't have anything to do with this AFAICT; that's just where patsy puts the names after it has figured them out, to pass them back to the user.

jtornero · 2013-06-17T06:56:46Z

Dear njsmith,

I'm sorry maybe I mixed up stuff with statsmodels and patsy. What I wanted to mean is concerned to the names that appear int statsmodels.GLMResults.summary(). Those names are the names I referred in my first message, sort of

C(trimcod, Treatment(4))[T.3]:C(flota, Treatment(11))[T.10]

for instance.

That's why I asked for simpler names. AFAIK those names are stored in desig_info.column_names but they're not modifyable by user, BUT if you get pandas DataFrames instead of designmatrices as output for patsy.dmatrices(), you are able to tweak those names replacing text in the DataFrame column list for what you want.

So what I am asking for is:

A way to make possible to rename the output column names to whatever you want; maybe something like a design_info.setColumnNames for both desingmatrices output or dataframe output. Or just and additional parameter in patsy.dmatrix and/or patsy.dmatrices to provide a list of column names, say so, display names.

The issue about tratment is because it is a nice way to, at least, provide nicer names for some part of the final column names in the dmatrices/dmatrix output. Maybe an option in dmatrices /dmatrix like columns_names_from treatment=True could do the trick also.

I provide you with a little example... with three or four interactions, the GLMResults.summary gets a little confusing:

C(trimcod, Treatment(4))[T.3]:C(flota, Treatment(11))[T.10]:C(flota, Treatment(6))[T.11]

Whe the relevant information is that the term is formed by the interaction of

trimcod 3, flota 11 and flota 6

I hope I have explained myself better this time. Sorry for the inconveniences.

Jorge Tornero

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simpler term names #19

Simpler term names #19

jtornero commented May 24, 2013

njsmith commented May 24, 2013

jtornero commented May 24, 2013

njsmith commented May 24, 2013

jtornero commented May 28, 2013

njsmith commented Jun 10, 2013

jtornero commented Jun 17, 2013

Simpler term names #19

Simpler term names #19

Comments

jtornero commented May 24, 2013

njsmith commented May 24, 2013

jtornero commented May 24, 2013

njsmith commented May 24, 2013

jtornero commented May 28, 2013

njsmith commented Jun 10, 2013

jtornero commented Jun 17, 2013