Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vignette on combine cut interaction functions #195

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,8 @@ navbar:
href: articles/export.html
- text: Subtotals and Headings
href: articles/subtotals.html
- text: Combining Answers and Variables
href: articles/combine-cut-interact.html
- text: Crunch Internals
href: articles/crunch-internals.html
- text: Abstract Categories
Expand Down
1 change: 1 addition & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Multitables
na
nd
NumericVariable
olds
OrderGroup
OrderGroups
PermissionCatalog
Expand Down
56 changes: 48 additions & 8 deletions vignette-data/make-vignette-rdata.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
library(crunch)
options(crunch.api=getOption("test.api"),
crunch.debug=FALSE,
crunch.email=getOption("test.user"),
crunch.pw=getOption("test.pw"))
# options(crunch.api=getOption("test.api"),
# crunch.debug=FALSE,
# crunch.email=getOption("test.user"),
# crunch.pw=getOption("test.pw"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned this before, brian, we should set up your r profile to have values for these options that point to a good backend. In the repository these lines should stay uncommented.

login()

## 1. Getting started
Expand Down Expand Up @@ -35,6 +35,7 @@ ds$imiss <- makeArray(ds[grep("^imiss_", names(ds))], name="Issue importance")
show_imiss_subvars <- crunch:::showSubvariables(subvariables(ds$imiss))
show_imiss <- capture.output(print(ds$imiss))
names_imiss_subvars <- names(subvariables(ds$imiss))
imiss.cats <- categories(ds$imiss)

newnames <- c("The economy", "Immigration",
"The environment", "Terrorism", "Gay rights", "Education",
Expand All @@ -45,13 +46,29 @@ show_imiss_subvars2 <- crunch:::showSubvariables(subvariables(ds$imiss))
sorting <- order(names(subvariables(ds$imiss)))
subvariables(ds$imiss) <- subvariables(ds$imiss)[sorting]
show_imiss_subvars3 <- crunch:::showSubvariables(subvariables(ds$imiss))
ds$imiss_topboxes <- combine(ds$imiss, name="Issue Importance (Top Boxes)",
combinations=list(
list(name="Important", categories=c("Very Important", "Somewhat Important")),
list(name="Not Important", categories=c("Not very Important", "Unimportant"))
))
imiss_topboxes.cats <- categories(ds$imiss_topboxes)

show_boap_4 <- capture.output(print(ds$boap_4))
ds$boap <- makeMR(ds[grep("^boap_[0-9]+", names(ds))],
name="Approval of Obama on issues",
selections=c("Strongly approve", "Somewhat approve"))
show_boap_subvars <- crunch:::showSubvariables(subvariables(ds$boap))
show_boap <- c(crunch:::showCrunchVariableTitle(ds$boap),
show_boap_subvars)
ds$boap_combined <- combine(ds$boap, name="Approval of Obama on issues (Combined Subvariables)",
combinations=list(
list(name="All Others", responses=c('boap_2', 'boap_3', 'boap_4', 'boap_5', 'boap_6',
'boap_7', 'boap_8', 'boap_9', 'boap_10', 'boap_11'))
))
show_boap_combined_subvars <- crunch:::showSubvariables(subvariables(ds$boap_combined))
show_boap_combined <- c(crunch:::showCrunchVariableTitle(ds$boap_combined),
show_boap_combined_subvars)

ds$boap <- undichotomize(ds$boap)
show_boap2 <- capture.output(print(ds$boap))
ds$boap <- dichotomize(ds$boap, "Strongly approve")
Expand Down Expand Up @@ -130,6 +147,7 @@ exclusion(ds) <- ds$perc_skipped > 15
high_perc_skipped <- capture.output(print(exclusion(ds)))
dim.ds.excluded <- dim(ds)


message("subtotals")
sub_initial_subtotals <- subtotals(ds$manningknowledge)
subtotals(ds$manningknowledge) <- list(
Expand All @@ -151,16 +169,38 @@ sub_headings <- subtotals(ds$obamaapp)
subtotals(ds$obamaapp) <- NULL
approve_subtotals <- list(
Subtotal(name = "Approves",
categories = c("Somewhat approve", "Strongly approve"),
after = "Somewhat approve"),
categories = c("Somewhat approve", "Strongly approve"),
after = "Somewhat approve"),
Subtotal(name = "Disapprove",
categories = c("Somewhat disapprove", "Strongly disapprove"),
after = "Strongly disapprove"))
categories = c("Somewhat disapprove", "Strongly disapprove"),
after = "Strongly disapprove"))
subtotals(ds$snowdenleakapp) <- approve_subtotals
subtotals(ds$congapp) <- approve_subtotals
sub_snowdon <- subtotals(ds$snowdenleakapp)
sub_con <- subtotals(ds$congapp)
sub_crtab <- crtabs(~congapp + gender, ds)


message("10. Re-Combining Answers and Variables")
ds$age4 <- cut(ds$age, name="Age (4 categories)",
breaks=c(17,29,44,64,100), labels=c('18-29', '30-44', '45-64', '65+'))
age4.var <- ds$age4
summary.age4.var <- capture.output(print(age4.var))
age4.cats <- categories(ds$age4)
ds$age3 <- combine(ds$age4, name="Age (3 categories)",
combinations=list(
list(name="18-44", categories=c('18-29', '30-44'))
))
age3.var <- ds$age3
summary.age3.var <- capture.output(print(age3.var))
age3.cats <- categories(ds$age3)
gender.var <- ds$gender
summary.gender.var <- capture.output(print(gender.var))
ds$gender_by_age <- interactVariables(ds$gender, ds$age3, name="Gender by Age")
gender_by_age.var <- ds$gender_by_age
summary.gender_by_age.var <- capture.output(print(gender_by_age.var))
gender_by_age.cats <- categories(ds$gender_by_age)

save.image(file="../vignettes/vignettes.RData")

with_consent(delete(ds)) ## cleanup
157 changes: 157 additions & 0 deletions vignettes/combine-cut-interact.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
title: "Combining Answers and Variables"
description: "Vignette showing you how to take existing variables and recombine their answers or other variables."
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Adding Variables}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

[Previous: subtotals](subtotals.html)

```{r, results='hide', echo=FALSE, message=FALSE}
## Because the vignette tasks require communicating with a remote host,
## we do all the work ahead of time and save a workspace, which we load here.
## We'll then reference saved objects in that as if we had just retrieved them
## from the server
library(crunch)
load("vignettes.RData")
options(width=120)
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Many common data cleaning steps involve grouping a number of categories or values together for easier analysis. Crunch provides a number of functions which make this kind of work easy:

- `cut()` allows you to transform a continuous numeric variable into a set of bins
- `combine()` lets you collapse a categorical variable's categories together
- `subtotals()` displays subtotaled categories along side the other categories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't go into too much detail about this lower down, so I would vote that we remove it from our list here.

- `interactVariables()` creates a new variable by interacting the categories of two categorical variables.

This vignette goes through examples of each of these functions to show how they can be used together.

# Cutting a Numeric Variable into Categories
Say we have a numeric type variable `age`, which is in years from 18-99, and we want to place each answer into one of a few categories: 18-29, 30-44, etc. We can use the `cut()` function to do just that and _cut_ the numeric variable into a new Categorical type variable. We designed this function to match the way that base R's `cut()` function works.

```{r, eval = FALSE}
ds$age4 <- cut(ds$age,
name = "Age (4 categories)",
breaks = c(17, 29, 44, 64, 100),
labels = c('18-29', '30-44', '45-64', '65+')
)
```
```{r, eval = FALSE}
categories(ds$age4)
```
```{r, echo = FALSE}
cat(summary.age4.var, sep = "\n")
```

And now we have a new Categorical variable with the alias `age4` and the name "Age (4 categories)". The variable has four categories based on the breaks we supplied to `cut`.

# Combining Answer Choices
## Categorical Type Variables
Sometimes we want to create subtotals (aka "nets" or "top boxes") for a Categorical type variable, where we preserve all of the original categories and collapse two or more categories together and for those cases we should use the `subtotals()` function (for more information and details about these, see [the subtotals vignette](subtotals.html)). But other times we do *not* want to preserve all the original categories and instead combine them into a smaller set of categories. To do that we use the `combine()` function.

Let's take the variable "Age (4 categories)" and combine the two youngest categories to create a new variable we will call "Age (3 categories)".

```{r, eval = FALSE}
categories(ds$age4)
```
```{r, echo = FALSE}
age4.cats
```
```{r, eval = FALSE}
ds$age3 <- combine(ds$age4,
name="Age (3 categories)",
combinations=list(
list(name="18-44", categories=c('18-29', '30-44'))
)
)
```
```{r, eval = FALSE}
categories(ds$age3)
```
```{r, echo = FALSE}
age3.cats
```
And now we have a new variable with the alias `age3`, the name "Age (3 categories)", and a category that combines 18 to 44 year-olds.

Note how this created an entirely new variable and so we can use it just like any other variable in Crunch. We can hide the original "Age (4 categories)" variable because we no longer need it. Hiding the original variable will not affect our new variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to link to the vignette that explains hide? or the help file? Or provide an example?


## Categorical Array Type Variables
We can use the `combine()` function to combine Categorical Arrays in the same way that we combined categorical variables.

Let's take the variable "Issue Importance (categorical array)", which has the alias `imiss` and 11 subvariables with 4 categories: Very Important, Somewhat Important, Not very Important, and Unimportant. We would like to create a new Categorical Array variable that combines the two Important categories together and another that combines the two Not Important.

```{r, eval = FALSE}
categories(ds$imiss)
```
```{r, echo = FALSE}
imiss.cats
```
```{r, eval = FALSE}
ds$imiss_topboxes <- combine(ds$imiss,
name ="Issue Importance (Top Boxes)",
combinations = list(
list(name = "Important", categories = c("Very Important", "Somewhat Important")),
list(name = "Not Important", categories = c("Not very Important", "Unimportant"))
)
)
```
```{r, eval = FALSE}
categories(ds$imiss_topboxes)
```
```{r, echo = FALSE}
imiss_topboxes.cats
```
We have created a new Categorical Array variable with the alias `imiss_topboxes`, the name "Issue Importance (Top Boxes)", and 2 categories instead of the original variable's 4.


## Multiple Response Type Variables
At first it might not seem that we can use the `combine()` function with Multiple Response type variables because each subvariable in the multiple response has already been reduced down to the categories that are "selected" or "not selected". However, there is an option that allows us to combine the subvariables (aka responses) in a multiple response similar to how we combined the categories in a categorical variable.

```{r, eval = FALSE}
ds$boap
```
```{r, echo = FALSE}
show_boap
```
```{r, eval = FALSE}
ds$boap_combined <- combine(ds$boap,
name="Approval of Obama on issues (Combined Subvariables)",
combinations=list(
list(name = "All Others",
responses = c('boap_2', 'boap_3', 'boap_4', 'boap_5', 'boap_6',
'boap_7', 'boap_8', 'boap_9', 'boap_10', 'boap_11'))
)
)
```
```{r, eval = FALSE}
ds$boap_combined
```
```{r, echo = FALSE}
show_boap_combined
```

We have created a new Multiple Response type variable with the alias `boap_combined`, the name "Approval of Obama on issues (Combined Subvariables)", which has 4 subvariables instead of the original 13.

# Combining Variables
Besides combining answer choices, we can also combine variables. For example, in our survey we asked people their gender and age. For our analysis we'd also like to have a third variable that combines gender and age together so that people are categorized as "Females, 18-25", "Females, 25+", "Males, 18-25", etc. We can cross gender and age to create a new variable using the `interactVariables()` function (named after 'interaction terms' in regression analysis).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to add a more elaborated use case in here.

```{r interact 2 cats, eval = FALSE}
ds$gender_by_age <- interactVariables(ds$gender, ds$age3, name = "Gender by Age")
```
```{r, eval = FALSE}
categories(ds$gender_by_age)
```
```{r interaction var, echo = FALSE}
gender_by_age.cats
```

This generates a new Categorical variable with a category for each possible combination of the 2 input variables, in this case it created a new category for each combination of gender and age group.

[Next: Crunch internals](crunch-internals.html)
2 changes: 1 addition & 1 deletion vignettes/crunch-internals.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ datasets(proj) <- ds

Internally there is actually a second method `datasets<-` that takes the value on the right hand side of the `<-` operator and posts that value to the datasets attribute of the project catalog. The projects catalog will then update to reflect that a dataset belongs to a particular catalog, and that will be reflected in the web app. Similar patterns happen when you get and set attributes on objects, like "names".

[Next: Category objects](abstract-categories.html)
[Next: category objects](abstract-categories.html)
2 changes: 1 addition & 1 deletion vignettes/datasets.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

[Previous: Getting started](getting-started.html)
[Previous: getting started](getting-started.html)

```{r, results='hide', echo=FALSE, message=FALSE}
## Because the vignette tasks require communicating with a remote host,
Expand Down
1 change: 1 addition & 0 deletions vignettes/getting-started.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,6 @@ The Crunch data store is built around datasets, which contain variables. Unlike
* [Filtering](filters.html): subsetting data, both in your R session and in the web interface
* [Downloading and exporting](export.html): how to pull data from the server, both for use in R and file export
* [Subtotals and headings](subtotals.html): how to set and get subtotals and headings for categorical variables
* [Combining answers and variables](combine-cut-interact.html): using Crunch tools to combine categories, responses, and variables
* [Crunch internals](crunch-internals.html): an introduction to the Crunch API and concepts to help you make more complex and more efficient queries
* [Category objects](abstract-categories.html): an introduction to the S4 classes that power categories and category-like representations in the package
2 changes: 1 addition & 1 deletion vignettes/subtotals.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,4 @@ noTransforms(sub_crtab)

This does not modify the variable---the subtotals are still defined and visible in the web app---but they are removed from the current analysis.

[Next: Crunch internals](crunch-internals.html)
[Next: combining answers and variables](combine-cut-interact.html)
Binary file modified vignettes/vignettes.RData
Binary file not shown.