-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path07A-Plotting on data-aware grids.tex
394 lines (359 loc) · 16.4 KB
/
07A-Plotting on data-aware grids.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
\documentclass{beamer}
\usepackage{framed}
\usepackage{graphicx}
\begin{document}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
Plotting on data-aware grids
\begin{itemize}
\item When exploring medium-dimensional data, a useful approach is to draw multiple instances of the same plot on different subsets of your dataset.
\item This technique is sometimes called either “lattice”, or “trellis” plotting, and it is related to the idea of “small multiples”. It allows a viewer to quickly extract a large amount of information about complex data.
\item Matplotlib offers good support for making figures with multiple axes; seaborn builds on top of this to directly link the structure of the plot to the structure of your dataset.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
To use these features, your data has to be in a Pandas DataFrame and it must take the form of what Hadley Whickam calls “tidy” data. In brief, that means your dataframe should be structured such that each column is a variable and each row is an observation.
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item For advanced use, you can use the objects discussed in this part of the tutorial directly, which will provide maximum flexibility. Some seaborn functions (such as lmplot(), factorplot(), and pairplot()) also use them behind the scenes.
\item Unlike other seaborn functions that are “Axes-level” and draw onto specific (possibly already-existing) matplotlib Axes without otherwise manipulating the figure, these higher-level functions create a figure when called and are generally more strict about how it gets set up.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item
In some cases, arguments either to those functions or to the constructor of the class they rely on will provide a different interface attributes like the figure size, as in the case of lmplot() where you can set the height and aspect ratio for each facet rather than the overall size of the figure.
\item Any function that uses one of these objects will always return it after plotting, though, and most of these objects have convenience methods for changing how the plot, often in a more abstract and easy way.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{verbatim}
%matplotlib inline
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
sns.set(style="ticks")
np.random.seed(sum(map(ord, "axis_grids")))
\end{verbatim}
\end{frame}
\section{Subsetting data with FacetGrid}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\noindent \textbf{Subsetting data with FacetGrid}
\begin{itemize}
\item The FacetGrid class is useful when you want to visualize the distribution of a variable or the relationship between multiple variables separately within subsets of your dataset.
\item A FacetGrid can be drawn with up to three dimensions: row, col, and hue.
\item The first two have obvious correspondence with the resulting array of axes; think of the hue variable as a third dimension along a depth axis, where different levels are plotted with different colors.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
The class is used by initializing a FacetGrid object with a dataframe and the names of the variables that will form the row, column, or hue dimensions of the grid. These variables should be categorical or discrete, and then the data at each level of the variable will be used for a facet along that axis. For example, say we wanted to examine differences between lunch and dinner in the tips dataset.
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
Additionally, both \texttt{lmplot()} and \texttt{factorplot()} use this object internally, and they return the object when they are finsihed so that it can be used for further tweaking.
\begin{verbatim}
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")
../_images/axis_grids_8_0.png
\end{verbatim}
Initializing the grid like this sets up the matplotlib figure and axes, but doesn’t draw anything on them.
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
The main approach for visualizing data on this grid is with the FacetGrid.map() method. Provide it with a plotting function and the name(s) of variable(s) in the dataframe to plot. Let’s look at the distribution of tips in each of these subsets, using a histogram.
\begin{verbatim}
g = sns.FacetGrid(tips, col="time")
g.map(plt.hist, "tip");
../_images/axis_grids_10_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
This function will draw the figure and annotate the axes, hopefully producing a finished plot in one step. To make a relational plot, just pass multiple variable names. You can also provide keyword arguments, which will be passed to the plotting function:
\begin{verbatim}
g = sns.FacetGrid(tips, col="sex", hue="smoker")
g.map(plt.scatter, "total_bill", "tip", alpha=.7)
g.add_legend();
../_images/axis_grids_12_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
There are several options for controlling the look of the grid that can be passed to the class constructor.
\begin{verbatim}
g = sns.FacetGrid(tips, row="smoker", col="time", margin_titles=True)
g.map(sns.regplot, "size", "total_bill", color=".3", fit_reg=False, x_jitter=.1);
../_images/axis_grids_14_0.png
\end{verbatim}
Note that margin\_titles isn’t formally supported by the matplotlib API, and may not work well in all cases. In particular, it currently can’t be used with a legend that lies outside of the plot.
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
The size of the figure is set by providing the height of each facet, along with the aspect ratio:
\begin{verbatim}
g = sns.FacetGrid(tips, col="day", size=4, aspect=.5)
g.map(sns.barplot, "sex", "total_bill");
../_images/axis_grids_16_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
With versions of matplotlib > 1.4, you can pass parameters to be used in the gridspec module. The can be used to draw attention to a particular facet by increasing its size. It’s particularly useful when visualizing distributions of datasets with unequal numbers of groups in each facet.
\begin{verbatim}
titanic = sns.load_dataset("titanic")
titanic = titanic.assign(deck=titanic.deck.astype(object)).sort("deck")
g = sns.FacetGrid(titanic, col="class", sharex=False,
gridspec_kws={"width_ratios": [5, 3, 3]})
g.map(sns.boxplot, "deck", "age");
../_images/axis_grids_18_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
The default ordering of the facets is derived from the information in the DataFrame. If the variable used to define facets has a categorical type, then the order of the categories is used. Otherwise, the facets will be in the order of appearence of the category levels. It is possible, however, to specify an ordering of any facet dimension with the appropriate \texttt{$\ast$\_order} parameter:
\begin{verbatim}
ordered_days = tips.day.value_counts().index
g = sns.FacetGrid(tips, row="day", row_order=ordered_days,
size=1.7, aspect=4,)
g.map(sns.distplot, "total_bill", hist=False, rug=True);
../_images/axis_grids_20_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item Any seaborn color palette (i.e., something that can be passed to color\_palette() can be provided.
\item You can also use a dictionary that maps the names of values in the hue variable to valid matplotlib colors:
\end{itemize}
\begin{verbatim}
pal = dict(Lunch="seagreen", Dinner="gray")
g = sns.FacetGrid(tips, hue="time", palette=pal, size=5)
g.map(plt.scatter, "total_bill", "tip", s=50, alpha=.7, linewidth=.5, edgecolor="white")
g.add_legend();
../_images/axis_grids_22_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item You can also let other aspects of the plot vary across levels of the hue variable, which can be helpful for making plots that will be more comprehensible when printed in black-and-white.
\item To do this, pass a dictionary to hue\_kws where keys are the names of plotting function keyword arguments and values are lists of keyword values, one for each level of the hue variable.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{verbatim}
g = sns.FacetGrid(tips, hue="sex", palette="Set1", size=5, hue_kws={"marker": ["^", "v"]})
g.map(plt.scatter, "total_bill", "tip", s=100, linewidth=.5, edgecolor="white")
g.add_legend();
../_images/axis_grids_24_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
If you have many levels of one variable, you can plot it along the columns but “wrap” them so that they span multiple rows. When doing this, you cannot use a row variable.
\begin{verbatim}
attend = sns.load_dataset("attention").query("subject <= 12")
g = sns.FacetGrid(attend, col="subject", col_wrap=4, size=2, ylim=(0, 10))
g.map(sns.pointplot, "solutions", "score", color=".3", ci=None);
../_images/axis_grids_26_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item Once you’ve drawn a plot using \texttt{FacetGrid.map()} (which can be called multiple times), you may want to adjust some aspects of the plot.
\item There are also a number of methods on the \texttt{FacetGrid} object for manipulating the figure at a higher level of abstraction. \item The most general is \texttt{FacetGrid.set()}, and there are other more specialized methods like \texttt{FacetGrid.set\_axis\_labels()}, which respects the fact that interior facets do not have axis labels. For example:
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{verbatim}
with sns.axes_style("white"):
\end{verbatim}
\begin{verbatim}
g = sns.FacetGrid(tips, row="sex", col="smoker", margin_titles=True, size=2.5)
g.map(plt.scatter, "total_bill", "tip", color="#334488", edgecolor="white", lw=.5);
g.set_axis_labels("Total bill (US Dollars)", "Tip");
g.set(xticks=[10, 30, 50], yticks=[2, 6, 10]);
g.fig.subplots_adjust(wspace=.02, hspace=.02);
../_images/axis_grids_28_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item For even more customization, you can work directly with the underling matplotlib Figure and Axes objects, which are stored as member attributes at fig and axes (a two-dimensional array), respectively.
\item When making a figure without row or column faceting, you can also use the ax attribute to directly access the single axes.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{verbatim}
g = sns.FacetGrid(tips, col="smoker", margin_titles=True, size=4)
g.map(plt.scatter, "total_bill", "tip", color="#338844", edgecolor="white", s=50, lw=1)
for ax in g.axes.flat:
ax.plot((0, 50), (0, .2 * 50), c=".2", ls="--")
g.set(xlim=(0, 60), ylim=(0, 14));
\end{verbatim}
% ../_images/axis_grids_30_0.png
\end{frame}
\section{Mapping custom functions onto the grid}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\noindent \textbf{Mapping custom functions onto the grid}
\begin{itemize}
\item You’re not limited to existing matplotlib and seaborn functions when using FacetGrid. However, to work properly, any function you use must follow a few rules:
\item It must plot onto the “currently active” matplotlib Axes. This will be true of functions in the matplotlib.pyplot namespace, and you can call \texttt{plt.gca} to get a reference to the current Axes if you want to work directly with its methods.
\item It must accept the data that it plots in positional arguments. Internally, FacetGrid will pass a Series of data for each of the named positional arguments passed to \texttt{FacetGrid.map()}.
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item It must be able to accept color and label keyword arguments, and, ideally, it will do something useful with them.
\item In most cases, it’s easiest to catch a generic dictionary of $\ast \ast$\texttt{kwargs} and pass it along to the underlying plotting function.
\item Let’s look at minimal example of a function you can plot with. This function will just take a single vector of data for each facet:
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{framed}
\begin{verbatim}
def quantile_plot(x, **kwargs):
qntls, xr = stats.probplot(x, fit=False)
plt.scatter(xr, qntls, **kwargs)
g = sns.FacetGrid(tips, col="sex", size=4)
g.map(quantile_plot, "total_bill");
../_images/axis_grids_32_0.png
\end{verbatim}
\end{framed}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
If we want to make a bivariate plot, you should write the function so that it accepts the x-axis variable first and the y-axis variable second:
\begin{framed}
\begin{verbatim}
def qqplot(x, y, **kwargs):
_, xr = stats.probplot(x, fit=False)
_, yr = stats.probplot(y, fit=False)
plt.scatter(xr, yr, **kwargs)
g = sns.FacetGrid(tips, col="smoker", size=4)
g.map(qqplot, "total_bill", "tip");
../_images/axis_grids_34_0.png
\end{verbatim}
\end{framed}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
Because plt.scatter accepts color and label keyword arguments and does the right thing with them, we can add a hue facet without any difficulty:
\begin{verbatim}
g = sns.FacetGrid(tips, hue="time", col="sex", size=4)
g.map(qqplot, "total_bill", "tip")
g.add_legend();
../_images/axis_grids_36_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
This approach also lets us use additional aesthetics to distinguish the levels of the hue variable, along with keyword arguments that won’t be depdendent on the faceting variables:
\begin{verbatim}
g = sns.FacetGrid(tips, hue="time", col="sex", size=4,
hue_kws={"marker": ["s", "D"]})
g.map(qqplot, "total_bill", "tip", s=40, edgecolor="w")
g.add_legend();
../_images/axis_grids_38_0.png
\end{verbatim}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{itemize}
\item Sometimes, though, you’ll want to map a function that doesn’t work the way you expect with the color and label keyword arguments.
\item In this case, you’ll want to explictly catch them and handle them in the logic of your custom function.
\item For example, this approach will allow use to map \texttt{plt.hexbin}, which otherwise does not play well with the FacetGrid API:
\end{itemize}
\end{frame}
%====================================%
\begin{frame}[fragile]
\frametitle{Seaborn Workshop}
\large
\begin{verbatim}
def hexbin(x, y, color, **kwargs):
cmap = sns.light_palette(color, as_cmap=True)
plt.hexbin(x, y, gridsize=15, cmap=cmap, **kwargs)
with sns.axes_style("dark"):
g = sns.FacetGrid(tips, hue="time", col="time", size=4)
g.map(hexbin, "total_bill", "tip", extent=[0, 50, 0, 10]);
../_images/axis_grids_40_0.png
\end{verbatim}
\end{frame}
%=========================================================== %
\end{document}