Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -606,8 +606,7 @@ fit = glm(death ~ masfem * dam + masfem * zpressure, data = df.filtered, family
```

Although this is a reasonable choice for count data, one can also argue
for a log-linear regression model instead, as count data such as deaths
tend to be approximately log-normally distributed.
for a Gaussian regression model instead, which is sometimes done after log-transforming the sum of the count data (deaths) and $1$.

Specifying this in a multiverse analysis may require changes in two
different locations in the code: the specification of the `deaths`
Expand Down
77 changes: 29 additions & 48 deletions vignettes/example-hurricane.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ title: "Example multiverse implementation: Female hurricanes are deadlier than m
author:
- Abhraneel Sarma, Northwestern University
- Alex Kale, University of Washington
- Michael Shiuming Truong, York University
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
Expand Down Expand Up @@ -31,7 +32,7 @@ library(dplyr)
library(modelr)
library(tidyr)
library(ggplot2)
library(tidybayes)
library(marginaleffects)
library(multiverse)
```

Expand Down Expand Up @@ -73,7 +74,7 @@ hurricane_data <- hurricane |>
```

## Original analysis
We then illustrate an implementation of the original analysis by Jung et al. [https://doi.org/10.1073/pnas.1402786111]. The original analysis used a negative binomial model, which is suitable for overdispersed count data. Due to some issues with model fit with the `MASS::glm.nb` function (see Note 3: https://github.com/uwdata/boba/tree/master/example/hurricane), we instead use the simpler poisson regression model which will ensure that none of the models fail while fitting.
We then illustrate a slightly modified analysis based on the original analysis by Jung et al. [https://doi.org/10.1073/pnas.1402786111]. The original analysis used a negative binomial model, which is suitable for overdispersed count data. Due to some issues with model fit with the `MASS::glm.nb` function (see Note 3: https://github.com/uwdata/boba/tree/master/example/hurricane), we instead use the simpler poisson regression model which will ensure that none of the models fail while fitting, but may produce slightly different results.

In the original analysis, Jung et al. exclude two hurricanes which caused the highest number of deaths (Katrina and Audrey) as outliers. They transform the variable used the interactions between the 11-point femininity rating and both damages and zpressure respectively, as seen below:

Expand Down Expand Up @@ -141,12 +142,12 @@ df <- df |>
````

### Declaring alternative specifications of regression model
The next step is to fit the model. We can use either a log-linear model or a poisson model for the step. Both are reasonable alternatives for this dataset. We also have to make a choice on whether we want to include an interaction between `femininity` and `damage`. This results in the following specification:
The next step is to fit the model. We can use either a Gaussian or Poisson model for the step. (Here, the Gaussian model uses a natural-log transformation of the sum of the response variable `death` and $1$. Generally, the Gaussian model is the maximum-likelihood equivalent of running an ordinary least squares regression, except for estimates like the standard deviation of the residuals.) Both are reasonable alternatives for this dataset. We also have to make a choice on whether we want to include an interaction between `femininity` and `damage`. This results in the following specification:


````
```{multiverse label = default-m-3, inside = M}`r ''`
fit <- glm(branch(model, "linear" ~ log(death + 1), "poisson" ~ death) ~
fit <- glm(branch(model, "gaussian" ~ log(death + 1), "poisson" ~ death) ~
branch(main_interaction,
"no" ~ femininity + damage,
"yes" ~ femininity * damage
Expand All @@ -158,7 +159,7 @@ fit <- glm(branch(model, "linear" ~ log(death + 1), "poisson" ~ death) ~
"all" %when% (main_interaction == "yes") ~ femininity * z3,
"all_no_interaction" %when% (main_interaction == "no") ~ z3
) + branch(covariates, "1" ~ NULL, "2" ~ year:damage, "3" ~ post:damage),
family = branch(model, "linear" ~ "gaussian", "poisson" ~ "poisson"),
family = branch(model, "gaussian" ~ "gaussian", "poisson" ~ "poisson"),
data = df)
```
````
Expand All @@ -167,27 +168,17 @@ Once we have implemented the analysis model in our multiverse, the corresponding

````
```{multiverse label = default-m-4, inside = M}`r ''`
pred <- predict(fit, se.fit = TRUE, type = "response")

pred2expectation <- function(mu, sigma) {
branch(model, "linear" ~ exp(mu + sigma^2/2) - 1, "poisson" ~ mu)
}

disagg_fit <- df |>
mutate(
fitted = pred$fit, # add fitted predictions and standard errors to dataframe
se.fit = pred$se.fit,
deg_f = df.residual(fit), # get degrees of freedom
sigma = sigma(fit), # get residual standard deviation
se.residual = sqrt(sum(residuals(fit)^2) / deg_f) # get residual standard errors
expectation <- marginaleffects::avg_predictions(
model = fit,
by = "female",
type = "response",
transform = branch(
model,
"gaussian" ~ function(x) exp(x) - 1,
"poisson" ~ NULL
)

# aggregate fitted effect of female storm name
expectation <- disagg_fit |>
mutate(expected_deaths = pred2expectation(fitted, sigma)) |>
group_by(female) |>
summarise(mean_deaths = mean(expected_deaths), .groups = "drop_last") |>
compare_levels(mean_deaths, by = female)
)$estimate |>
diff()
```
````

Expand Down Expand Up @@ -223,7 +214,7 @@ inside(M, {
)
)

fit <- glm(branch(model, "linear" ~ log(death + 1), "poisson" ~ death) ~
fit <- glm(branch(model, "gaussian" ~ log(death + 1), "poisson" ~ death) ~
branch(main_interaction,
"no" ~ femininity + damage,
"yes" ~ femininity * damage
Expand All @@ -235,30 +226,20 @@ inside(M, {
"all" %when% (main_interaction == "yes") ~ femininity * z3,
"all_no_interaction" %when% (main_interaction == "no") ~ z3
) + branch(covariates, "1" ~ NULL, "2" ~ year:damage, "3" ~ post:damage),
family = branch(model, "linear" ~ "gaussian", "poisson" ~ "poisson"),
family = branch(model, "gaussian" ~ "gaussian", "poisson" ~ "poisson"),
data = df)

pred <- predict(fit, se.fit = TRUE, type = "response")

pred2expectation <- function(mu, sigma) {
branch(model, "linear" ~ exp(mu + sigma^2/2) - 1, "poisson" ~ mu)
}

disagg_fit <- df |>
mutate(
fitted = pred$fit, # add fitted predictions and standard errors to dataframe
se.fit = pred$se.fit,
deg_f = df.residual(fit), # get degrees of freedom
sigma = sigma(fit), # get residual standard deviation
se.residual = sqrt(sum(residuals(fit)^2) / deg_f) # get residual standard errors
)

# aggregate fitted effect of female storm name
expectation <- disagg_fit |>
mutate(expected_deaths = pred2expectation(fitted, sigma)) |>
group_by(female) |>
summarise(mean_deaths = mean(expected_deaths), .groups = "drop_last") |>
compare_levels(mean_deaths, by = female)
expectation <- marginaleffects::avg_predictions(
model = fit,
by = "female",
type = "response",
transform = branch(
model,
"gaussian" ~ function(x) exp(x) - 1,
"poisson" ~ NULL
)
)$estimate |>
diff()
})
```

Expand Down
2 changes: 1 addition & 1 deletion vignettes/execution-multiverse.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ inside(M, {
```


Alternative specifications of regression model: The next step is to fit the model. We can use either a log-linear model or a poisson model for the step. Both are reasonable alternatives for this dataset. We also have to make a choice on whether we want to include an interaction between `femininity` and `damage`. This results in the following specification:
Alternative specifications of regression model: The next step is to fit the model. We can use either a gaussian model or a poisson model for the step. Both are reasonable alternatives for this dataset. We also have to make a choice on whether we want to include an interaction between `femininity` and `damage`. This results in the following specification:


```{r}
Expand Down
Loading