-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.Rmd
251 lines (169 loc) · 11.5 KB
/
readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
---
title: "GroupThink"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
![Banner image for GroupThink package](https://github.com/Samuel-Osian-Andrews/GroupThink/blob/main/readme_files/GroupThink.png)
## Introduction and Install
GroupThink is a package designed to assist in the analysis in categorical survey data. It mainly acts as an interface for existing `tidyverse` functions - but makes it easier to aggregate responses, do cross-question analysis, and avoid classic mistakes typical of survey data analysis.
It currently has two functions:`unify()` and `assess()` (though others are planned for the future...!).
GroupThink isn't on CRAN, so you'll need to use `devtools` to install it. Run:
```{r install, results='hide', warning=FALSE, error=FALSE, message=FALSE, error=FALSE}
install.packages("devtools")
library(devtools)
devtools::install_github("Samuel-Osian-Andrews/GroupThink")
library(GroupThink)
```
As GroupThink is still in development, you should periodically reinstall the package to get updates.
### Dependencies
GroupThink depends on `dplyr`, `tidyr` and `gt` libraries. If these aren't installed automatically when you install GroupThink, you may need to run:
```{r dependencies, error=FALSE, message=FALSE, results='hide', error=FALSE}
install.packages(c("dplyr", "tidyr", "gt"))
```
## Benefits of GroupThink
GroupThink is a response to key bottlenecks and common mistakes when analysing survey data. The function is beneficial because it...
- **Allows for easy groupings.** `unify()` makes it very easy to group together different Likert-style responses (e.g. combining `Somewhat agree` with `Strongly agree`, or `Excellent` and `Very good`). It's now extremely difficult to make mistakes with incorrect groupings, as `unify()` alerts you of any unassigned responses.
- **Automates your calculations.** `unify()` handles n and proportion calculations for you, meaning you no longer have to undertake complex data manipulation tasks, avoiding functions such as `pivot_longer()`, which can result in inaccurate figures if you're not careful!
- **Works with full questions as column headers.** Typically, exporting survey responses (such as from Microsoft Forms or SurveyMonkey) will leave you with full questions (e.g. "Do you agree or disagree that...") as column headers. This is usually a nightmare to work with in R. Because `unify()` works on column indexes, rather than column names, you don't need to worry about recoding your columns or typing out full survey questions throughout your code.
- **Gives usable outputs.**. `unify()` neatly integrates with ggplot, allowing you to visualise your aggregated data. Alternatively, you can produce formatted tables through the gtTable argument.
- **Presents clear, readable syntax.** Even for those unfamiliar with R syntax, `unify()` makes it very clear exactly how you've grouped together your responses, improving readability and reproducibility.
- **Means faster insights.** With just a few lines of code, this function could save you hours worth of work for large survey projects.
```{r simulate, echo=FALSE, results='hide', message=FALSE, error=FALSE}
# Activate libraries
library(tidyverse)
library(gt)
# Create a sample dataset
set.seed(1357)
data <- tibble(
`I find the course material engaging and relevant.` = factor(sample(
c("Strongly agree", "Somewhat agree", "Neither agree nor disagree", "Somewhat disagree", "Strongly disagree", "Don't know"),
100, replace = TRUE), levels = c("Strongly disagree", "Somewhat disagree", "Neither agree nor disagree", "Somewhat agree", "Strongly agree", "Don't know")),
`The course workload is manageable within my schedule.` = factor(sample(
c("Highly agree", "Agree", "Neither agree nor disagree", "Disagree", "Highly disagree", "Unsure"),
100, replace = TRUE), levels = c("Highly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Highly agree", "Unsure")),
`Feedback from assignments is helpful for my learning.` = factor(sample(
c("Strongly agree", "Agree", "Indifferent", "Disagree", "Strongly disagree", "No opinion"),
100, replace = TRUE), levels = c("Strongly disagree", "Disagree", "Indifferent", "Agree", "Strongly agree", "No opinion"))
)
# Function to randomly introduce NAs into a vector
introduce_NAs <- function(x, Proportion = 0.05) {
na_indices <- sample(1:length(x), size = floor(Proportion * length(x)), replace = FALSE)
x[na_indices] <- NA
return(x)
}
data <- data %>%
mutate(
`The course workload is manageable within my schedule.` = introduce_NAs(`The course workload is manageable within my schedule.`),
`Feedback from assignments is helpful for my learning.` = introduce_NAs(`Feedback from assignments is helpful for my learning.`)
)
```
## Core functionality
### The `unify()` function
The `unify()` function groups together Likert-style responses for a given question or set of questions, returning a summarised output that contains the n and proportion for each of these groupings.
```{r unify, error=TRUE}
unify(data, cols = 1, # ...dataframe name and column index number(s) to analyse
# Below, we 'group' responses via custom grouping labels (e.g. 'Agree'):
Agree = c("Somewhat agree", "Strongly agree"),
Disagree = c("Somewhat disagree", "Strongly disagree"),
Neutral = "Neither agree nor disagree",
ignore = "Don't know") # ...optionally, set response(s) to ignore from calcs
```
The grouping labels can be anything you like. For example, `Agree` could instead be `Positive`, `Good`, `Satisifed` or something else entirely. Similarly, `Don't know` could be its own group, instead of being ignored. You may include as many grouping labels as you'd like.
There's of course nothing wrong with having just 1 response option per group (e.g. `"Somewhat agree" = "Somewhat agree"`). The main purpose of `unify()` is that it forces you to be **intentional** with how you handle your data, to improve consistency and avoid mistakes.
### Left out responses
If you forgot to include a response in your custom groupings, `unify()` will throw an error. This is crucial for avoiding mistakes in your proportion calculations. For example:
```{r error, error=TRUE}
unify(data, 1, Agree = "Somewhat agree",
#"Strongly agree"), -- let's stop unify() from seeing this line
Disagree = c("Somewhat disagree", "Strongly disagree"),
Neutral = "Neither agree nor disagree",
ignore = "Don't know")
```
As seen above, the output tells you that you forgot to assign "Strongly agree" to a grouping variable.
These errors are crucial, since other R functions do not warn you if you haven't accounted for a group, or mistyped "Strongly **A**gree", as "Strongly **a**gree", for example.
### Data for unify()
The `unify()` function expects data that looks like this:
```{r head, echo=FALSE, error=TRUE}
head(data, n = 10)
```
Responses do not need to be consistently labelled either within or between different questions/columns, and can contain missing data (you'll likely want to assign `NA` to the ignore parameter).
### View column index numbers
Since GroupThink functions work with column **index numbers**, not column names, you'll likely want to summarise all index numbers of your dataset. For this, run `colnames()` from base-R.
```{r colnames, error=TRUE}
colnames(data)
```
## The `assess()` function
You might find it beneficial to run GroupThink's `assess()` function, which provides an overview of the different response options in your specified columns.
```{r assess, error=TRUE}
assess(data, cols = c(2, 3))
```
## Further functionality
#### Aggregate across multiple columns/questions
You are not restricted to analysing just one question/column with `unify()`. You can specify multiple columns/questions to use for the output:
```{r cols, error=TRUE}
unify(data, c(1, 2, 3), # ...analyse Columns 1, 2 and 3
Positive = c("Somewhat agree", "Strongly agree", "Highly agree", "Agree"),
Negative = c("Somewhat disagree", "Strongly disagree", "Highly disagree",
"Disagree"),
ignore = c(NA, "Don't know", "Unsure", "Neither agree nor disagree",
"No opinion", "Indifferent"),
hideN = TRUE) # ...(optional) hide n column from output (a lot cleaner!)
```
...Just make sure that you've accounted for each response option across your range of columns, otherwise you'll get an error.
#### Make formatted tables
Using the gtTable argument, `unify()` makes it simple to create nice, formatted tables.
```{r gtTable, error=TRUE, message=FALSE, warning=FALSE}
unify(data, 1, Agree = c("Somewhat agree", "Strongly agree"),
Disagree = c("Somewhat disagree", "Strongly disagree"),
Neutral = "Neither agree nor disagree",
ignore = "Don't know",
filter = c("Agree", "Disagree"),
hideN = TRUE, # ...optionally, hide N column
gtTable = TRUE) # ...set gtTable to TRUE
```
#### Filter out responses from the output only
If you want to only display one response option in the output, we can use the `filter` argument.
Note that this is different from the `ignore` parameter: `filter` removes unwanted responses *after* the calculations have been performed, while `ignore` removes them *before*.
```{r filter, error=TRUE}
unify(data, 3,
Agree = c("Agree", "Strongly agree"),
Disagree = c("Disagree", "Strongly disagree"),
Neither = c("No opinion", "Indifferent"),
ignore = c(NA, "Don't know"),
filter = "Agree") # ...only include the Agree group in the output
```
The other variable groupings are used for the calculations, but only "Agree" responses are shown in the final output.
#### Integrate with ggplot
Unless you've set `unify()`'s gt_table() argument to `TRUE`, it will output as a tibble. This means it integrates neatly into `ggplot()` function calls.
Let's pretend we've already run `unify()` on columns 1, 2 & 3, and assigned it to the name `united`...
```{r united, echo=FALSE, message=FALSE, error=TRUE, results='hide'}
united <- unify(data, c(1, 2, 3),
Agree = c("Agree", "Strongly agree", "Highly agree",
"Somewhat agree"),
Disagree = c("Disagree", "Strongly disagree", "Highly disagree",
"Somewhat disagree"),
ignore = c("Neither agree nor disagree", "Unsure", "Indifferent",
"Don't know", "No opinion", NA))
```
```{r ggplot, error=TRUE}
ggplot(data = united, # ...unify() output becomes ggplot()'s data argument
aes(x = Question, y = `Agree (Proportion)`, fill = Question)) +
geom_col() +
# ...below are just optional customisation options:
coord_flip() +
theme_bw() +
scale_fill_manual(values = c("cornflowerblue", "coral", "chartreuse3")) +
scale_y_continuous(limits = c(0, 70)) +
theme(legend.position = "none")
```
## Even more functionality
For other functionality not covered in this document, please run `?unify()` and `?assess()` to view the help files, which covers all function parameters.
## Future plans
- Add support for `stargazer` tables into `unify()`.
- Develop a separate function for analysing multiple choice data for data formats typical of exported survey data.
## Bug reports and feature requests
Please do let me know any issues you come across. You can use the **Issues** tab in GitHub for any bug reports.
If you have any ideas for existing features, or perhaps even new ones, then I'd love to hear them. Let me know in the **Discussion** tab in GitHub.
I am also open to invitations to collaborate.