-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical encodings, entropic approach #119
Comments
You didn't miss anything. I'm not aware of any categorical encoding schemes that use information theory. I skimmed the video and all I saw was general discussion about how to calculate entropy; not it's connection to encoding categorical variables for use in linear modeling. Is this a thing you've encountered, and if so, do you have a link for it? (Preferably not video.) |
I'll put some links below, but in principle its replacing the categories
with a probability of occurance of that category when compared against one
or more other continuous variables in that dataset. I'll put some links
below to examples that have been manually worked out. Sorry I should've
given this b-ary entropy encoding background in my original email.
- This is a more detailed example
<https://youtu.be/gmiINKkYcF8?t=3m22s> from
another author, fully calculated. He calls it "binning", which is the same
as nominal categorization. But more importantly, he compares the different
value ranges for best gain in info probability. This is the central reason
for categorical encoding in statistical inference. The coding reasons in
statistics are just a convenience, but the real reason is finding info
gain.
- Another example <https://youtu.be/LodZWzrbayY?t=8m37s> with 3
categories shown as part of the definition that goes into other follow up
videos by the same author.
- Wikipedia article
<https://en.wikipedia.org/wiki/Entropy_(information_theory)#Entropy_as_a_measure_of_diversity>,
see the b-ary section
- Most statistical mechanics text books have a chapter about this. I
believe Claude Shannon used to actually teach this in class by himself in
the first week of classes.
For implementing in patsy, I envsion one or more categorical variables fed
via the "formula" along with a target variable that is a continuous
variable. This target var is what gets the ranges or bins (as in the bins
parameter for np.dititize). Patsy will then calculate max "information
gain" figure (which is a probability value between 0 and 1) for the
categories based on a vareity of ranges of the continuous variable. Instead
of zeros and 1s, it will use the floating point values for each of the
categories.
Hope this helps clarify.
…On Fri, Jan 26, 2018 at 3:01 PM, Nathaniel J. Smith < ***@***.***> wrote:
You didn't miss anything. I'm not aware of any categorical encoding
schemes that use information theory. I skimmed the video and all I saw was
general discussion about how to calculate entropy; not it's connection to
encoding categorical variables for use in linear modeling. Is this a thing
you've encountered, and if so, do you have a link for it? (Preferably not
video.)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#119 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AO-LTI9odPacba4hoKmnthy_wPo8E7eVks5tOllRgaJpZM4Ru8cf>
.
|
Here it sounds like you're talking about the problem of, given a continuous random variable, find a discretization that preserves the most information (I guess under the constraint that each discrete value has to correspond to a contiguous range of the continuous space)? Patsy's categorical encodings are about taking a categorical variable, and encoding it as a multi-dimensional real vector. I'm not sure what connection you see between these two problems, or whether you mean something else.
The b-ary section just gives the definition of the entropy of a categorical random variable (with
Can you give a concrete, worked example? |
Can you give a concrete, worked example?
All three video links I sent have worked examples in them.
…On Fri, Jan 26, 2018 at 8:59 PM, Nathaniel J. Smith < ***@***.***> wrote:
He calls it "binning", which is the same as nominal categorization. But
more importantly, he compares the different value ranges for best gain in
info probability. This is the central reason for categorical encoding in
statistical inference. The coding reasons in statistics are just a
convenience, but the real reason is finding info gain.
Here it sounds like you're talking about the problem of, given a
continuous random variable, find a discretization that preserves the most
information (I guess under the constraint that each discrete value has to
correspond to a contiguous range of the continuous space)? Patsy's
categorical encodings are about taking a categorical variable, and encoding
it as a multi-dimensional real vector. I'm not sure what connection you see
between these two problems, or whether you mean something else.
- Wikipedia article https://en.wikipedia.org/wiki/
Entropy_(information_theory)#Entropy_as_a_measure_of_diversity
<https://en.wikipedia.org/wiki/Entropy_(information_theory)#Entropy_as_a_measure_of_diversity>,
see the b-ary section
The b-ary section just gives the definition of the entropy of a
categorical random variable (with b setting the base of the log). I know
what entropy is; what I don't know is what it has to do with categorical
encodings :-).
For implementing in patsy, I envsion one or more categorical variables fed
via the "formula" along with a target variable that is a continuous
variable. This target var is what gets the ranges or bins (as in the bins
parameter for np.dititize). Patsy will then calculate max "information
gain" figure (which is a probability value between 0 and 1) for the
categories based on a vareity of ranges of the continuous variable. Instead
of zeros and 1s, it will use the floating point values for each of the
categories.
Can you give a concrete, worked example?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#119 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AO-LTKKhFcfIefX4jPKUzv3-ZXvHMt-Zks5tOq02gaJpZM4Ru8cf>
.
|
I'm asking because I can't make any head or tail out of anything you've said so far, so just referring back to it isn't going to clarify anything. Also I'm pretty sure none of those videos had examples of patsy formulas. |
One of my favorite parts of patsy are the categorical encodings. That feature alone is worth using it. While playing with those options, I could not find encodings based on standard entropy calculations? Is that covered in there and I missed?
For an example of using entropy for categorical encodings, see this video: https://youtu.be/IPkRVpXtbdY?t=4m49s
The text was updated successfully, but these errors were encountered: