Skip to content

Color by factor/group when plotting #12639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GA-Goig opened this issue Mar 16, 2016 · 5 comments
Closed

Color by factor/group when plotting #12639

GA-Goig opened this issue Mar 16, 2016 · 5 comments

Comments

@GA-Goig
Copy link

GA-Goig commented Mar 16, 2016

Hi everyone!

Sorry if this is not the correct place to write this, I'm new to GitHub.

I'm really enjoying pandas but I think there is a basic feature it does not implement, which is
easily color by factor when plotting a dataframe. At least I didn't be able to find how this could be done in a "Pythonic" way.

Say we have a dataframe with data records from weather stations. Different columns have the location of hundreds of stations and rows have measurements for different days of the month (say 30 rows). There is an additional column "Continent" that points which continent each weather station belongs to. Or maybe a separated Series object "as.type('categorical')"

The point is that it should be easy to plot this dataframe, keeping my data separated while coloring and labeling in legend with different colors for different groups, in this case 5 continents, so 5 colors and 5 legend labels.

I'm able to do this with some code, but I think it would be nice to have a feature like:

df.plot(color_by_group=df.Continent) or similar.

PD: Sorry if this, or anything really similar is already implemented.

@TomAugspurger
Copy link
Contributor

I think you could already do this with the color parameter, if I understand your question correctly.

That said, you're probably better off using seaborn. Most of it's plotting methods take a hue keyword, which would work for you.

@GA-Goig
Copy link
Author

GA-Goig commented Mar 16, 2016

Thanks a lot for your help! However I'm not sure I made it clear before.

Let me show you a real example.

I have a dataframe of fluorescence values from a chemical reaction measured over 45 time points for many different samples. Having a dataframe where columns are samples and rows are measures for each of the 45 time points, a simple plot looks like this:

df.plot()
figure_1

However each sample belongs to one of three groups. So I build a dictionary with colors for each one:

gcolors = {'Saliva': '#00B0F6', 'Blood': '#E58700', 'Feces': '#E76BF3'}

And then plot like TomAugspurger points, with color parameter:

df.plot(color=[gcolors[group] for group in SampleGroups])
#SampleGroups was a column with group for each sample (something like df.Group)

figure_2

It only remains to have a legend with just three labels "Saliva", "Blood", "Feces" with corresponding colors. (I don't know how to do this yet)

Anyway, here the point is that, if I am not wrong, it would be a nice simple feature to include in pandas since I think this a pretty common way of grouping and representing data.

@TomAugspurger
Copy link
Contributor

I'd say pass legend=False and then add a legend manually. The 1-column=1-line=1-legend element is pretty central to the basic DataFrame.plot. There should be answers on stackoverflow for creating a custom matplotlib legend.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2016

this is essentially implemented in #8018

@Ochirgarid
Copy link

@GA-Goig I think this issue is solved. Confirm and close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants