Categorical encodings, entropic approach #119

sjLambda · 2018-01-26T22:08:23Z

One of my favorite parts of patsy are the categorical encodings. That feature alone is worth using it. While playing with those options, I could not find encodings based on standard entropy calculations? Is that covered in there and I missed?

For an example of using entropy for categorical encodings, see this video: https://youtu.be/IPkRVpXtbdY?t=4m49s

njsmith · 2018-01-26T23:01:36Z

You didn't miss anything. I'm not aware of any categorical encoding schemes that use information theory. I skimmed the video and all I saw was general discussion about how to calculate entropy; not it's connection to encoding categorical variables for use in linear modeling. Is this a thing you've encountered, and if so, do you have a link for it? (Preferably not video.)

sjLambda · 2018-01-27T04:16:19Z

I'll put some links below, but in principle its replacing the categories with a probability of occurance of that category when compared against one or more other continuous variables in that dataset. I'll put some links below to examples that have been manually worked out. Sorry I should've given this b-ary entropy encoding background in my original email. - This is a more detailed example <https://youtu.be/gmiINKkYcF8?t=3m22s> from another author, fully calculated. He calls it "binning", which is the same as nominal categorization. But more importantly, he compares the different value ranges for best gain in info probability. This is the central reason for categorical encoding in statistical inference. The coding reasons in statistics are just a convenience, but the real reason is finding info gain. - Another example <https://youtu.be/LodZWzrbayY?t=8m37s> with 3 categories shown as part of the definition that goes into other follow up videos by the same author. - Wikipedia article <https://en.wikipedia.org/wiki/Entropy_(information_theory)#Entropy_as_a_measure_of_diversity>, see the b-ary section - Most statistical mechanics text books have a chapter about this. I believe Claude Shannon used to actually teach this in class by himself in the first week of classes. For implementing in patsy, I envsion one or more categorical variables fed via the "formula" along with a target variable that is a continuous variable. This target var is what gets the ranges or bins (as in the bins parameter for np.dititize). Patsy will then calculate max "information gain" figure (which is a probability value between 0 and 1) for the categories based on a vareity of ranges of the continuous variable. Instead of zeros and 1s, it will use the floating point values for each of the categories. Hope this helps clarify.

…

On Fri, Jan 26, 2018 at 3:01 PM, Nathaniel J. Smith < ***@***.***> wrote: You didn't miss anything. I'm not aware of any categorical encoding schemes that use information theory. I skimmed the video and all I saw was general discussion about how to calculate entropy; not it's connection to encoding categorical variables for use in linear modeling. Is this a thing you've encountered, and if so, do you have a link for it? (Preferably not video.) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AO-LTI9odPacba4hoKmnthy_wPo8E7eVks5tOllRgaJpZM4Ru8cf> .

njsmith · 2018-01-27T04:59:33Z

He calls it "binning", which is the same as nominal categorization. But more importantly, he compares the different value ranges for best gain in info probability. This is the central reason for categorical encoding in statistical inference. The coding reasons in statistics are just a convenience, but the real reason is finding info gain.

Here it sounds like you're talking about the problem of, given a continuous random variable, find a discretization that preserves the most information (I guess under the constraint that each discrete value has to correspond to a contiguous range of the continuous space)? Patsy's categorical encodings are about taking a categorical variable, and encoding it as a multi-dimensional real vector. I'm not sure what connection you see between these two problems, or whether you mean something else.

Wikipedia article https://en.wikipedia.org/wiki/Entropy_(information_theory)#Entropy_as_a_measure_of_diversity, see the b-ary section

The b-ary section just gives the definition of the entropy of a categorical random variable (with b setting the base of the log). I know what entropy is; what I don't know is what it has to do with categorical encodings :-).

For implementing in patsy, I envsion one or more categorical variables fed via the "formula" along with a target variable that is a continuous variable. This target var is what gets the ranges or bins (as in the bins parameter for np.dititize). Patsy will then calculate max "information gain" figure (which is a probability value between 0 and 1) for the categories based on a vareity of ranges of the continuous variable. Instead of zeros and 1s, it will use the floating point values for each of the categories.

Can you give a concrete, worked example?

sjLambda · 2018-01-27T06:58:22Z

Can you give a concrete, worked example? All three video links I sent have worked examples in them.

…

On Fri, Jan 26, 2018 at 8:59 PM, Nathaniel J. Smith < ***@***.***> wrote: He calls it "binning", which is the same as nominal categorization. But more importantly, he compares the different value ranges for best gain in info probability. This is the central reason for categorical encoding in statistical inference. The coding reasons in statistics are just a convenience, but the real reason is finding info gain. Here it sounds like you're talking about the problem of, given a continuous random variable, find a discretization that preserves the most information (I guess under the constraint that each discrete value has to correspond to a contiguous range of the continuous space)? Patsy's categorical encodings are about taking a categorical variable, and encoding it as a multi-dimensional real vector. I'm not sure what connection you see between these two problems, or whether you mean something else. - Wikipedia article https://en.wikipedia.org/wiki/ Entropy_(information_theory)#Entropy_as_a_measure_of_diversity <https://en.wikipedia.org/wiki/Entropy_(information_theory)#Entropy_as_a_measure_of_diversity>, see the b-ary section The b-ary section just gives the definition of the entropy of a categorical random variable (with b setting the base of the log). I know what entropy is; what I don't know is what it has to do with categorical encodings :-). For implementing in patsy, I envsion one or more categorical variables fed via the "formula" along with a target variable that is a continuous variable. This target var is what gets the ranges or bins (as in the bins parameter for np.dititize). Patsy will then calculate max "information gain" figure (which is a probability value between 0 and 1) for the categories based on a vareity of ranges of the continuous variable. Instead of zeros and 1s, it will use the floating point values for each of the categories. Can you give a concrete, worked example? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#119 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AO-LTKKhFcfIefX4jPKUzv3-ZXvHMt-Zks5tOq02gaJpZM4Ru8cf> .

njsmith · 2018-01-27T08:08:11Z

I'm asking because I can't make any head or tail out of anything you've said so far, so just referring back to it isn't going to clarify anything. Also I'm pretty sure none of those videos had examples of patsy formulas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical encodings, entropic approach #119

Categorical encodings, entropic approach #119

sjLambda commented Jan 26, 2018

njsmith commented Jan 26, 2018

sjLambda commented Jan 27, 2018 via email

njsmith commented Jan 27, 2018

sjLambda commented Jan 27, 2018 via email

njsmith commented Jan 27, 2018

Categorical encodings, entropic approach #119

Categorical encodings, entropic approach #119

Comments

sjLambda commented Jan 26, 2018

njsmith commented Jan 26, 2018

sjLambda commented Jan 27, 2018 via email

njsmith commented Jan 27, 2018

sjLambda commented Jan 27, 2018 via email

njsmith commented Jan 27, 2018