Skip to content

Commit

Permalink
Issue #548: Refactor the getting started vignette (#549)
Browse files Browse the repository at this point in the history
* make a plan for the getting started vignette

* update use of ggplot2

* rework data simulation

* write out more getting started detail and fix a bounds bug

* finalise format

* postprocessing

* reword intro

* fix linting issues

* improve intro and add background section

* add a test for naive postprocessing

* add right truncation figure

* add right truncation schematic figure

* fix namespace for ggplot2

* update news

* add primarycensored note

* fix repeated brackets

* check for outstanding gt usage

* linting

* update spacing

* read through issues corrected

* reorder yaml

* use details approach vs not working code folding approach

* try and resolve caption failures by using a single line

* fix FAQ

* revert some over zealous edit changes

* check approx vignette

* hide more plots

* missing allocation

* catch renamed plot
  • Loading branch information
seabbs authored Mar 7, 2025
1 parent f08a6ce commit 39df3d3
Show file tree
Hide file tree
Showing 16 changed files with 628 additions and 208 deletions.
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ Description: Understanding and accurately estimating epidemiological delay
and resource allocation. This package provides methods to address
the key challenges in estimating these distributions, including truncation,
interval censoring, and dynamical biases. These issues are frequently
overlooked, resulting in biased conclusions.
overlooked, resulting in biased conclusions. Built on top of 'brms',
it allows for flexible modelling including time-varying spatial components
and partially pooled estimates of demographic characteristics.
License: MIT + file LICENSE
URL: https://epidist.epinowcast.org/,
https://github.com/epinowcast/epidist/
Expand All @@ -38,7 +40,6 @@ Imports:
checkmate,
cli,
dplyr,
ggplot2,
lubridate,
primarycensored,
purrr,
Expand All @@ -53,7 +54,7 @@ Suggests:
cmdstanr,
CodeDepends,
collapse,
gt,
ggplot2,
knitr,
loo,
marginaleffects,
Expand Down
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,6 @@ export(simulate_gillespie)
export(simulate_secondary)
export(simulate_uniform_cases)
export(weibull)
import(ggplot2)
importFrom(brms,as.brmsprior)
importFrom(brms,bf)
importFrom(brms,lognormal)
Expand Down
9 changes: 8 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# epidist 0.2.0.1000

This is the development version of `epidist`.
This release adds support for a wider range of distributions in the marginal model, improves documentation with new vignettes and FAQ sections, enhances the getting started guide with clearer examples of model comparison, and fixes several bugs related to parameter bounds and likelihood calculations.

## Models

Expand All @@ -16,11 +16,18 @@ This is the development version of `epidist`.
- Added a new vignette "Guide to the statistical models implemented in epidist". See #514.
- Added a new FAQ section showcasing how to use the `posterior` package with `epidist` models, particularly for working with random variables (`rvars`) to propagate uncertainty in calculations. See #547.
- Added a new FAQ section on how to use the `marginaleffects` package with `epidist` models. See #547.
- Reduced the focus on simulating data in the getting started vignette to make it more accessible. See #549.
- Made the entry to the package friendlier with clearer examples and improved documentation. See #549.
- Added a schematic to explain right truncation more clearly to the getting started vignette. See #549.
- Added a comparison of fitting naive and marginal models in the getting started vignette to highlight the importance of accounting for biases. See #549.
- Added examples showing how to extract estimated parameters and plot them against true values to evaluate model performance. See #549.

## Bugs

- Fixed a vector length issue for censoring that was causing problems in some likelihood calls. See #540.
- Fixed a bug in the preprocessing of the Weibull family. See #540.
- Fixed a bug where bounds were not set for mu parameters in custom families. See #549.
- Fixed a bug in `predict_delay_parameters()` where it couldn't detect brms families when used directly. See #549.

# epidist 0.2.0

Expand Down
2 changes: 0 additions & 2 deletions R/epidist-package.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
#' @keywords internal
"_PACKAGE"

#' @import ggplot2

## usethis namespace: start
#' @importFrom dplyr filter select
#' @importFrom brms bf prior
Expand Down
10 changes: 10 additions & 0 deletions R/globals.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
# Generated by roxyglobals: do not edit by hand

utils::globalVariables(c(
".data", # <epidist_diagnostics>
"samples", # <epidist_diagnostics>
".data", # <as_epidist_latent_model.epidist_linelist_data>
"woverlap", # <epidist_stancode.epidist_latent_model>
".data", # <as_epidist_linelist_data.epidist_aggregate_data>
".data", # <as_epidist_marginal_model.epidist_linelist_data>
".data", # <as_epidist_naive_model.epidist_linelist_data>
".data", # <add_mean_sd.lognormal_samples>
".data", # <add_mean_sd.gamma_samples>
".data", # <add_mean_sd.weibull_samples>
"rlnorm", # <simulate_secondary>
".data", # <simulate_secondary>
"fix", # <.replace_prior>
".data", # <.replace_prior>
NULL
))
10 changes: 8 additions & 2 deletions R/latent_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,14 @@ epidist_family_model.epidist_latent_model <- function(
paste0("latent_", family$family),
dpars = family$dpars,
links = c(family$link, family$other_links),
lb = c(NA, as.numeric(lapply(family$other_bounds, "[[", "lb"))),
ub = c(NA, as.numeric(lapply(family$other_bounds, "[[", "ub"))),
lb = c(
as.numeric(family$ybounds[1]),
as.numeric(lapply(family$other_bounds, "[[", "lb"))
),
ub = c(
as.numeric(family$ybounds[2]),
as.numeric(lapply(family$other_bounds, "[[", "ub"))
),
type = family$type,
vars = c(
"vreal1",
Expand Down
10 changes: 8 additions & 2 deletions R/marginal_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,14 @@ epidist_family_model.epidist_marginal_model <- function(
paste0("marginal_", family$family),
dpars = family$dpars,
links = c(family$link, family$other_links),
lb = c(NA, as.numeric(lapply(family$other_bounds, "[[", "lb"))),
ub = c(NA, as.numeric(lapply(family$other_bounds, "[[", "ub"))),
lb = c(
as.numeric(family$ybounds[1]),
as.numeric(lapply(family$other_bounds, "[[", "lb"))
),
ub = c(
as.numeric(family$ybounds[2]),
as.numeric(lapply(family$other_bounds, "[[", "ub"))
),
type = "int",
vars = c(
"vreal1[n]",
Expand Down
8 changes: 7 additions & 1 deletion R/postprocess.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,14 @@ predict_delay_parameters <- function(fit, newdata = NULL, ...) {
lp_dpar <- brms::get_dpar(pp, dpar = dpar, inv_link = TRUE)
samples_df[[dpar]] <- as.vector(lp_dpar)
}
# Use family$name if it exists, otherwise fall back to family
family_name <- if (!is.null(fit$family$name)) {
sub(".*_", "", fit$family$name)
} else {
fit$family$family
}
class(samples_df) <- c(
paste0(sub(".*_", "", fit$family$name), "_samples"),
paste0(family_name, "_samples"),
class(samples_df)
)
samples_df <- add_mean_sd(samples_df)
Expand Down
2 changes: 1 addition & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ development:

navbar:
structure:
left: [faq, intro, reference, articles, news]
left: [intro, faq,reference, articles, news]
right: [search, github, lightswitch]
components:
faq:
Expand Down
2 changes: 1 addition & 1 deletion man/epidist-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions tests/testthat/setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,18 @@ if (not_on_cran()) {
backend = "cmdstanr"
))

cli::cli_alert_info("Compiling the naive model with cmdstanr")
fit_naive <- epidist(
data = prep_naive_obs,
seed = 1,
chains = 2,
cores = 2,
silent = 2,
refresh = 0,
iter = 1000,
backend = "cmdstanr"
)

cli::cli_alert_info(
"Compiling the latent model with cmdstanr and a gamma dist"
)
Expand Down
23 changes: 21 additions & 2 deletions tests/testthat/test-postprocess.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ test_that(
expect_true(all(pred$sd > 0))
expect_length(unique(pred$index), expected_rows)
expect_length(unique(pred$draw), summary(fit)$total_ndraws)
return(invisible(NULL))
}

# Test latent and marginal models
Expand All @@ -23,14 +24,31 @@ test_that(
}
)

test_that(
"predict_delay_parameters works with the naive lognormal model",
{
skip_on_cran()

# Test naive model predictions
set.seed(1)
pred_naive <- predict_delay_parameters(fit_naive)
expect_s3_class(pred_naive, "lognormal_samples")
expect_s3_class(pred_naive, "data.frame")
expect_named(pred_naive, c("draw", "index", "mu", "sigma", "mean", "sd"))
expect_true(all(pred_naive$mean > 0))
expect_true(all(pred_naive$sd > 0))
expect_length(unique(pred_naive$draw), summary(fit_naive)$total_ndraws)
}
)


test_that("predict_delay_parameters accepts newdata arguments and prediction by sex recovers underlying parameters", { # nolint: line_length_linter.
skip_on_cran()

# Helper function to test sex predictions
test_sex_predictions <- function(fit, prep = prep_obs_sex) {
set.seed(1)
prep <- prep |>
dplyr::mutate(.row_id = dplyr::row_number())
prep <- dplyr::mutate(prep, .row_id = dplyr::row_number())
pred_sex <- predict_delay_parameters(fit, prep)
expect_s3_class(pred_sex, "lognormal_samples")
expect_s3_class(pred_sex, "data.frame")
Expand Down Expand Up @@ -64,6 +82,7 @@ test_that("predict_delay_parameters accepts newdata arguments and prediction by
c(meanlog_f, sdlog_f),
tolerance = 0.1
)
return(invisible(NULL))
}

# Test latent and marginal models
Expand Down
60 changes: 41 additions & 19 deletions vignettes/approx-inference.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ In this demonstration, we use the following packages:
library(epidist)
library(ggplot2)
library(dplyr)
library(gt)
library(purrr)
library(tidyr)
library(tibble)
Expand All @@ -112,6 +111,8 @@ cmdstanr::cmdstan_version()
We can simulate data to use for fitting models.
The example data simulation process follows that used in the [Getting started with epidist](https://epidist.epinowcast.org/articles/epidist.html#data) vignette, so we will not detail exactly what is happening here, but please consult that vignette if interested:

<details><summary>Click to expand for data simulation code</summary>

```{r}
meanlog <- 1.8
sdlog <- 0.5
Expand All @@ -134,6 +135,8 @@ obs_cens_trunc_samp <- simulate_gillespie(seed = 101) |>
slice_sample(n = sample_size, replace = FALSE)
```

</details>

We now prepare the data for fitting with the marginal model, and perform inference with HMC:

```{r results='hide'}
Expand Down Expand Up @@ -187,6 +190,8 @@ time_pathfinder <- proc.time() - t
We now extract posterior distribution for the delay parameters from the fitted model for each inference method.
Thankfully, each algorithm is implemented to sample draws from the posterior distribution, and so post-processing is simple.

<details><summary>Click to expand for code to extract posterior draws</summary>

```{r}
fits <- list(
HMC = fit_hmc,
Expand All @@ -196,7 +201,8 @@ fits <- list(
)
draws <- imap(fits, function(fit, name) {
predict_delay_parameters(fit) |>
draws <- fit |>
predict_delay_parameters() |>
as.data.frame() |>
pivot_longer(
cols = c("mu", "sigma", "mean", "sd"),
Expand All @@ -205,11 +211,14 @@ draws <- imap(fits, function(fit, name) {
) |>
filter(parameter %in% c("mu", "sigma")) |>
mutate(method = as.factor(name))
return(draws)
})
draws <- bind_rows(draws)
```

</details>

## Comparison of parameter posterior distributions

The mean estimated value of each parameter, from each method is as follows.
Expand All @@ -218,19 +227,18 @@ The mean estimated value of each parameter, from each method is as follows.
pars <- draws |>
group_by(method, parameter) |>
summarise(value = mean(value)) |>
pivot_wider(names_from = parameter, values_from = value)
pivot_wider(names_from = parameter, values_from = value) |>
ungroup()
pars |>
ungroup() |>
gt()
pars
```

More comprehensively, the estimated posterior distributions are shown in Figure \@ref(fig:posterior).

(ref:posterior) Estimated posterior distributions for the `mu` and `sigma` parameters using each inference method, shown using `tidybayes::stat_slabinterval()`.
<details><summary>Click to expand for code to create posterior distribution plot</summary>

```{r posterior, fig.cap="(ref:posterior)"}
draws |>
```{r}
p_posterior <- draws |>
ggplot(aes(x = value, col = method)) +
stat_slabinterval(density = "histogram", breaks = 30, alpha = 0.8) +
scale_colour_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
Expand All @@ -241,24 +249,41 @@ draws |>
theme(legend.position = "bottom")
```

</details>

(ref:posterior) Estimated posterior distributions for the `mu` and `sigma` parameters using each inference method, shown using `tidybayes::stat_slabinterval()`.

```{r posterior, fig.cap="(ref:posterior)"}
p_posterior
```

## Comparison of resulting delay distributions

Figure \@ref(fig:delay-pdf) shows how the different `mu` and `sigma` posterior mean estimates from each inference method alter an estimated delay distribution.

(ref:delay-pdf) Delay probability density functions obtained based on the posterior mean estimated `mu` and `sigma` parameters.
<details><summary>Click to expand for code to create delay PDF plot</summary>

```{r delay-pdf, fig.cap="(ref:delay-pdf)"}
pmap_df(
```{r}
p_delay_pdf <- pmap_df(
filter(pars), ~ tibble(
x = seq(0, 25, by = 0.1),
method = ..1, density = dlnorm(x, ..2, ..3)
)
) |>
ggplot(aes(x = x, y = density, col = method)) +
geom_line() +
geom_line(linewidth = 1) +
scale_color_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
theme_minimal() +
labs(x = "", y = "", col = "Method")
labs(x = "", y = "", col = "Method") +
theme(legend.position = "bottom")
```

</details>

(ref:delay-pdf) Delay probability density functions obtained based on the posterior mean estimated `mu` and `sigma` parameters.

```{r delay-pdf, fig.cap="(ref:delay-pdf)"}
p_delay_pdf
```

## Comparison of time taken
Expand All @@ -274,10 +299,7 @@ times <- list(
Pathfinder = time_pathfinder
)
times |>
map_dbl("elapsed") |>
enframe(name = "method", value = "time (s)") |>
gt()
times
```

# Conclusion
Expand All @@ -287,4 +309,4 @@ We found that these algorithms do produce reasonable approximations in far less
Of course, this vignette only includes one example, and a more thorough investigation would be required to make specific recommendations.
That said, currently we do not recommend use of the Pathfinder algorithm due to its unstable performance in our testing, and early stage software implementation.

## Bibliography {-}
## References {-}
Loading

0 comments on commit 39df3d3

Please sign in to comment.