Issue #548: Refactor the getting started vignette (#549)

* make a plan for the getting started vignette * update use of ggplot2 * rework data simulation * write out more getting started detail and fix a bounds bug * finalise format * postprocessing * reword intro * fix linting issues * improve intro and add background section * add a test for naive postprocessing * add right truncation figure * add right truncation schematic figure * fix namespace for ggplot2 * update news * add primarycensored note * fix repeated brackets * check for outstanding gt usage * linting * update spacing * read through issues corrected * reorder yaml * use details approach vs not working code folding approach * try and resolve caption failures by using a single line * fix FAQ * revert some over zealous edit changes * check approx vignette * hide more plots * missing allocation * catch renamed plot
epinowcast · Mar 7, 2025 · 39df3d3 · 39df3d3
1 parent f08a6ce
commit 39df3d3
Show file tree

Hide file tree

Showing 16 changed files with 628 additions and 208 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -26,7 +26,9 @@ Description: Understanding and accurately estimating epidemiological delay
     and resource allocation. This package provides methods to address
     the key challenges in estimating these distributions, including truncation,
     interval censoring, and dynamical biases. These issues are frequently
-    overlooked, resulting in biased conclusions.
+    overlooked, resulting in biased conclusions. Built on top of 'brms',
+    it allows for flexible modelling including time-varying spatial components
+    and partially pooled estimates of demographic characteristics.
 License: MIT + file LICENSE
 URL: https://epidist.epinowcast.org/,
     https://github.com/epinowcast/epidist/
@@ -38,7 +40,6 @@ Imports:
     checkmate,
     cli,
     dplyr,
-    ggplot2,
     lubridate,
     primarycensored,
     purrr,
@@ -53,7 +54,7 @@ Suggests:
     cmdstanr,
     CodeDepends,
     collapse,
-    gt,
+    ggplot2,
     knitr,
     loo,
     marginaleffects,

diff --git a/NAMESPACE b/NAMESPACE
@@ -82,7 +82,6 @@ export(simulate_gillespie)
 export(simulate_secondary)
 export(simulate_uniform_cases)
 export(weibull)
-import(ggplot2)
 importFrom(brms,as.brmsprior)
 importFrom(brms,bf)
 importFrom(brms,lognormal)

diff --git a/NEWS.md b/NEWS.md
@@ -1,6 +1,6 @@
 # epidist 0.2.0.1000
 
-This is the development version of `epidist`.
+This release adds support for a wider range of distributions in the marginal model, improves documentation with new vignettes and FAQ sections, enhances the getting started guide with clearer examples of model comparison, and fixes several bugs related to parameter bounds and likelihood calculations.
 
 ## Models
 
@@ -16,11 +16,18 @@ This is the development version of `epidist`.
 - Added a new vignette "Guide to the statistical models implemented in epidist". See #514.
 - Added a new FAQ section showcasing how to use the `posterior` package with `epidist` models, particularly for working with random variables (`rvars`) to propagate uncertainty in calculations. See #547.
 - Added a new FAQ section on how to use the `marginaleffects` package with `epidist` models. See #547.
+- Reduced the focus on simulating data in the getting started vignette to make it more accessible. See #549.
+- Made the entry to the package friendlier with clearer examples and improved documentation. See #549.
+- Added a schematic to explain right truncation more clearly to the getting started vignette. See #549.
+- Added a comparison of fitting naive and marginal models in the getting started vignette to highlight the importance of accounting for biases. See #549.
+- Added examples showing how to extract estimated parameters and plot them against true values to evaluate model performance. See #549.
 
 ## Bugs
 
 - Fixed a vector length issue for censoring that was causing problems in some likelihood calls. See #540.
 - Fixed a bug in the preprocessing of the Weibull family. See #540.
+- Fixed a bug where bounds were not set for mu parameters in custom families. See #549.
+- Fixed a bug in `predict_delay_parameters()` where it couldn't detect brms families when used directly. See #549.
 
 # epidist 0.2.0
 

diff --git a/R/epidist-package.R b/R/epidist-package.R
@@ -1,8 +1,6 @@
 #' @keywords internal
 "_PACKAGE"
 
-#' @import ggplot2
-
 ## usethis namespace: start
 #' @importFrom dplyr filter select
 #' @importFrom brms bf prior

diff --git a/R/globals.R b/R/globals.R
@@ -1,9 +1,19 @@
 # Generated by roxyglobals: do not edit by hand
 
 utils::globalVariables(c(
+  ".data", # <epidist_diagnostics>
   "samples", # <epidist_diagnostics>
+  ".data", # <as_epidist_latent_model.epidist_linelist_data>
   "woverlap", # <epidist_stancode.epidist_latent_model>
+  ".data", # <as_epidist_linelist_data.epidist_aggregate_data>
+  ".data", # <as_epidist_marginal_model.epidist_linelist_data>
+  ".data", # <as_epidist_naive_model.epidist_linelist_data>
+  ".data", # <add_mean_sd.lognormal_samples>
+  ".data", # <add_mean_sd.gamma_samples>
+  ".data", # <add_mean_sd.weibull_samples>
   "rlnorm", # <simulate_secondary>
+  ".data", # <simulate_secondary>
   "fix", # <.replace_prior>
+  ".data", # <.replace_prior>
   NULL
 ))
diff --git a/R/latent_model.R b/R/latent_model.R
@@ -161,8 +161,14 @@ epidist_family_model.epidist_latent_model <- function(
     paste0("latent_", family$family),
     dpars = family$dpars,
     links = c(family$link, family$other_links),
-    lb = c(NA, as.numeric(lapply(family$other_bounds, "[[", "lb"))),
-    ub = c(NA, as.numeric(lapply(family$other_bounds, "[[", "ub"))),
+    lb = c(
+      as.numeric(family$ybounds[1]),
+      as.numeric(lapply(family$other_bounds, "[[", "lb"))
+    ),
+    ub = c(
+      as.numeric(family$ybounds[2]),
+      as.numeric(lapply(family$other_bounds, "[[", "ub"))
+    ),
     type = family$type,
     vars = c(
       "vreal1",

diff --git a/R/marginal_model.R b/R/marginal_model.R
@@ -207,8 +207,14 @@ epidist_family_model.epidist_marginal_model <- function(
     paste0("marginal_", family$family),
     dpars = family$dpars,
     links = c(family$link, family$other_links),
-    lb = c(NA, as.numeric(lapply(family$other_bounds, "[[", "lb"))),
-    ub = c(NA, as.numeric(lapply(family$other_bounds, "[[", "ub"))),
+    lb = c(
+      as.numeric(family$ybounds[1]),
+      as.numeric(lapply(family$other_bounds, "[[", "lb"))
+    ),
+    ub = c(
+      as.numeric(family$ybounds[2]),
+      as.numeric(lapply(family$other_bounds, "[[", "ub"))
+    ),
     type = "int",
     vars = c(
       "vreal1[n]",

diff --git a/R/postprocess.R b/R/postprocess.R
@@ -25,8 +25,14 @@ predict_delay_parameters <- function(fit, newdata = NULL, ...) {
     lp_dpar <- brms::get_dpar(pp, dpar = dpar, inv_link = TRUE)
     samples_df[[dpar]] <- as.vector(lp_dpar)
   }
+  # Use family$name if it exists, otherwise fall back to family
+  family_name <- if (!is.null(fit$family$name)) {
+    sub(".*_", "", fit$family$name)
+  } else {
+    fit$family$family
+  }
   class(samples_df) <- c(
-    paste0(sub(".*_", "", fit$family$name), "_samples"),
+    paste0(family_name, "_samples"),
     class(samples_df)
   )
   samples_df <- add_mean_sd(samples_df)

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -8,7 +8,7 @@ development:
 
 navbar:
   structure:
-    left: [faq, intro, reference, articles, news]
+    left: [intro, faq,reference, articles, news]
     right: [search, github, lightswitch]
   components:
     faq:

diff --git a/man/epidist-package.Rd b/man/epidist-package.Rd
diff --git a/tests/testthat/setup.R b/tests/testthat/setup.R
@@ -198,6 +198,18 @@ if (not_on_cran()) {
     backend = "cmdstanr"
   ))
 
+  cli::cli_alert_info("Compiling the naive model with cmdstanr")
+  fit_naive <- epidist(
+    data = prep_naive_obs,
+    seed = 1,
+    chains = 2,
+    cores = 2,
+    silent = 2,
+    refresh = 0,
+    iter = 1000,
+    backend = "cmdstanr"
+  )
+
   cli::cli_alert_info(
     "Compiling the latent model with cmdstanr and a gamma dist"
   )

diff --git a/tests/testthat/test-postprocess.R b/tests/testthat/test-postprocess.R
@@ -15,6 +15,7 @@ test_that(
       expect_true(all(pred$sd > 0))
       expect_length(unique(pred$index), expected_rows)
       expect_length(unique(pred$draw), summary(fit)$total_ndraws)
+      return(invisible(NULL))
     }
 
     # Test latent and marginal models
@@ -23,14 +24,31 @@ test_that(
   }
 )
 
+test_that(
+  "predict_delay_parameters works with the naive lognormal model",
+  {
+    skip_on_cran()
+
+    # Test naive model predictions
+    set.seed(1)
+    pred_naive <- predict_delay_parameters(fit_naive)
+    expect_s3_class(pred_naive, "lognormal_samples")
+    expect_s3_class(pred_naive, "data.frame")
+    expect_named(pred_naive, c("draw", "index", "mu", "sigma", "mean", "sd"))
+    expect_true(all(pred_naive$mean > 0))
+    expect_true(all(pred_naive$sd > 0))
+    expect_length(unique(pred_naive$draw), summary(fit_naive)$total_ndraws)
+  }
+)
+
+
 test_that("predict_delay_parameters accepts newdata arguments and prediction by sex recovers underlying parameters", { # nolint: line_length_linter.
   skip_on_cran()
 
   # Helper function to test sex predictions
   test_sex_predictions <- function(fit, prep = prep_obs_sex) {
     set.seed(1)
-    prep <- prep |>
-      dplyr::mutate(.row_id = dplyr::row_number())
+    prep <- dplyr::mutate(prep, .row_id = dplyr::row_number())
     pred_sex <- predict_delay_parameters(fit, prep)
     expect_s3_class(pred_sex, "lognormal_samples")
     expect_s3_class(pred_sex, "data.frame")
@@ -64,6 +82,7 @@ test_that("predict_delay_parameters accepts newdata arguments and prediction by
       c(meanlog_f, sdlog_f),
       tolerance = 0.1
     )
+    return(invisible(NULL))
   }
 
   # Test latent and marginal models

diff --git a/vignettes/approx-inference.Rmd b/vignettes/approx-inference.Rmd
@@ -93,7 +93,6 @@ In this demonstration, we use the following packages:
 library(epidist)
 library(ggplot2)
 library(dplyr)
-library(gt)
 library(purrr)
 library(tidyr)
 library(tibble)
@@ -112,6 +111,8 @@ cmdstanr::cmdstan_version()
 We can simulate data to use for fitting models.
 The example data simulation process follows that used in the [Getting started with epidist](https://epidist.epinowcast.org/articles/epidist.html#data) vignette, so we will not detail exactly what is happening here, but please consult that vignette if interested:
 
+<details><summary>Click to expand for data simulation code</summary>
+
 ```{r}
 meanlog <- 1.8
 sdlog <- 0.5
@@ -134,6 +135,8 @@ obs_cens_trunc_samp <- simulate_gillespie(seed = 101) |>
   slice_sample(n = sample_size, replace = FALSE)
 ```
 
+</details>
+
 We now prepare the data for fitting with the marginal model, and perform inference with HMC:
 
 ```{r results='hide'}
@@ -187,6 +190,8 @@ time_pathfinder <- proc.time() - t
 We now extract posterior distribution for the delay parameters from the fitted model for each inference method.
 Thankfully, each algorithm is implemented to sample draws from the posterior distribution, and so post-processing is simple.
 
+<details><summary>Click to expand for code to extract posterior draws</summary>
+
 ```{r}
 fits <- list(
   HMC = fit_hmc,
@@ -196,7 +201,8 @@ fits <- list(
 )
 
 draws <- imap(fits, function(fit, name) {
-  predict_delay_parameters(fit) |>
+  draws <- fit |>
+    predict_delay_parameters() |>
     as.data.frame() |>
     pivot_longer(
       cols = c("mu", "sigma", "mean", "sd"),
@@ -205,11 +211,14 @@ draws <- imap(fits, function(fit, name) {
     ) |>
     filter(parameter %in% c("mu", "sigma")) |>
     mutate(method = as.factor(name))
+  return(draws)
 })
 
 draws <- bind_rows(draws)
 ```
 
+</details>
+
 ## Comparison of parameter posterior distributions
 
 The mean estimated value of each parameter, from each method is as follows.
@@ -218,19 +227,18 @@ The mean estimated value of each parameter, from each method is as follows.
 pars <- draws |>
   group_by(method, parameter) |>
   summarise(value = mean(value)) |>
-  pivot_wider(names_from = parameter, values_from = value)
+  pivot_wider(names_from = parameter, values_from = value) |>
+  ungroup()
 
-pars |>
-  ungroup() |>
-  gt()
+pars
 ```
 
 More comprehensively, the estimated posterior distributions are shown in Figure \@ref(fig:posterior).
 
-(ref:posterior) Estimated posterior distributions for the `mu` and `sigma` parameters using each inference method, shown using `tidybayes::stat_slabinterval()`.
+<details><summary>Click to expand for code to create posterior distribution plot</summary>
 
-```{r posterior, fig.cap="(ref:posterior)"}
-draws |>
+```{r}
+p_posterior <- draws |>
   ggplot(aes(x = value, col = method)) +
   stat_slabinterval(density = "histogram", breaks = 30, alpha = 0.8) +
   scale_colour_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
@@ -241,24 +249,41 @@ draws |>
   theme(legend.position = "bottom")
 ```
 
+</details>
+
+(ref:posterior) Estimated posterior distributions for the `mu` and `sigma` parameters using each inference method, shown using `tidybayes::stat_slabinterval()`.
+
+```{r posterior, fig.cap="(ref:posterior)"}
+p_posterior
+```
+
 ## Comparison of resulting delay distributions
 
 Figure \@ref(fig:delay-pdf) shows how the different `mu` and `sigma` posterior mean estimates from each inference method alter an estimated delay distribution.
 
-(ref:delay-pdf) Delay probability density functions obtained based on the posterior mean estimated `mu` and `sigma` parameters.
+<details><summary>Click to expand for code to create delay PDF plot</summary>
 
-```{r delay-pdf, fig.cap="(ref:delay-pdf)"}
-pmap_df(
+```{r}
+p_delay_pdf <- pmap_df(
   filter(pars), ~ tibble(
     x = seq(0, 25, by = 0.1),
     method = ..1, density = dlnorm(x, ..2, ..3)
   )
 ) |>
   ggplot(aes(x = x, y = density, col = method)) +
-  geom_line() +
+  geom_line(linewidth = 1) +
   scale_color_manual(values = c("#56B4E9", "#009E73", "#E69F00", "#CC79A7")) +
   theme_minimal() +
-  labs(x = "", y = "", col = "Method")
+  labs(x = "", y = "", col = "Method") +
+  theme(legend.position = "bottom")
+```
+
+</details>
+
+(ref:delay-pdf) Delay probability density functions obtained based on the posterior mean estimated `mu` and `sigma` parameters.
+
+```{r delay-pdf, fig.cap="(ref:delay-pdf)"}
+p_delay_pdf
 ```
 
 ## Comparison of time taken
@@ -274,10 +299,7 @@ times <- list(
   Pathfinder = time_pathfinder
 )
 
-times |>
-  map_dbl("elapsed") |>
-  enframe(name = "method", value = "time (s)") |>
-  gt()
+times
 ```
 
 # Conclusion
@@ -287,4 +309,4 @@ We found that these algorithms do produce reasonable approximations in far less
 Of course, this vignette only includes one example, and a more thorough investigation would be required to make specific recommendations.
 That said, currently we do not recommend use of the Pathfinder algorithm due to its unstable performance in our testing, and early stage software implementation.
 
-## Bibliography {-}
+## References {-}