Skip to content

Conversation

@fweber144
Copy link
Collaborator

@fweber144 fweber144 commented Jul 26, 2025

This adds support for censored observations (i.e., survival or time-to-event analysis) when using the latent projection. Originally, #2 already mentioned that support for time-to-event models would be a nice feature, and the request came up again recently in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275. This PR only makes it possible to use the latent projection for time-to-event models. For a more general perspective, see #2.

The solution presented here is not optimal because it would probably be more desirable to allow for an extension of the formula in init_refmodel() (similarly to brms's resp_cens() term), see https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10. Via such an extension of the init_refmodel() formula, we could also handle observation weights in a better way.

In contrast to the less elegant solution suggested in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10, the solution presented here requires users to add a right-hand side formula as an attribute called cens_var to the latent_ll_oscale function. The variable named in that formula is then retrieved (internally, whenever calling the latent_ll_oscale function) from the original dataset (possibly subsetted to the observations from a given test set), newdata, or element data from varsel()'s argument d_test, whichever is applicable in the specific situation where the latent_ll_oscale function is called. The content of the retrieved variable is then passed to argument cens of the latent_ll_oscale function. In other words, the variable mentioned in the cens_var formula needs to contain the censoring indicators (e.g., 0 = uncensored, 1 = censored) which can then be used (by the user) within the latent_ll_oscale function. On the Stan Forums, I will update the Weibull (and perhaps also the log-normal) example from https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275 to illustrate this.

The solution presented here is still less elegant than the brms::resp_cens()-like solution, but in my opinion, it is preferable over the less elegant solution I suggested in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10. Via the cens_var attribute, we also catch the censored-observations case (when using the latent projection) in validate_vsel_object_stats() to throw a warning if a performance statistic other than "elpd", "mlpd", or "gmpd" is used.

This is only a draft PR. Still to do:

@avehtari: A few weeks ago, we discussed how we should make this feature available and we decided to keep this feature in a separate branch. Thinking over this again, I would say we could merge this into master as soon as the to-do's from above are completed. The reason is that I consider the implementation presented here not as "hacky" as the solution we discussed back then (which was the less elegant solution I suggested in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10). But I'll leave the decision up to you and @aloctavodia what you want to do with this PR. Unfortunately, I was not able to request a review from @aloctavodia for this PR.

@fweber144 fweber144 requested a review from avehtari July 26, 2025 20:09
@fweber144
Copy link
Collaborator Author

On the Stan Forums, I will update the Weibull (and perhaps also the log-normal) example from https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275 to illustrate this.

Added now at https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/23 (for the Weibull case).

(add argument `cens` to internal `latent_ll_oscale` functions)
@fweber144
Copy link
Collaborator Author

fweber144 commented Jul 27, 2025

I forgot to add the new argument cens to internal latent_ll_oscale functions, but this is now done in commit 6b76269. The tests are running through now (i.e., they don't result in errors).

@avehtari
Copy link
Member

Thinking over this again, I would say we could merge this into master as soon as the to-do's from above are completed. The reason is that I consider the implementation presented here not as "hacky" as the solution we discussed back then

OK for me to merge when it's ready

@fweber144
Copy link
Collaborator Author

@avehtari do you know someone who could finish this PR (the four to-do's listed above)?

@fweber144
Copy link
Collaborator Author

@aloctavodia
Copy link
Collaborator

I could give it a try. Do you prefer to merge this and do the rest in a separated PR? Or add everything here?

@fweber144
Copy link
Collaborator Author

I think I would prefer to add everything here so this PR is self-contained.

@fweber144
Copy link
Collaborator Author

@aloctavodia I saw you opened PR #537 and there updated the package version to a release version (i.e., a non-development version). Does that mean you are planning a CRAN release soon? If yes, do you think it would be possible to include this PR in that CRAN release?

@aloctavodia
Copy link
Collaborator

As the main goal of the PR (and release) is to unlock the publication of the package mclogit that our tests are currently blocking, I would prefer to do the release ASAP. I am sorry that I delayed working on this. I will end this PR next week, and then I also want to work on making the test more robust, as I have been having issues running them locally.

@fweber144
Copy link
Collaborator Author

Ok, I understand. The tests are running through on my machine without problems, but we've realized at several occasions in the past that the tests are quite instable as they depend on rstanarm and brms models being fitted at their beginning. That's why I deactivated most of the tests on CRAN (in #537, it looks like you modified tests that are indeed run on CRAN, so I understand that a new version of mclogit might face problems when running our tests). Unfortunately, I haven't had the time to revise our tests to make them more robust during the time when I was working on projpred. Thank you for working on this now.

censored-observations examples

First, we change the censoring times to avoid
problems in the log-normal example (there, it seems we had too many censored
observations).

Second, we change the true `sdlog` parameter to avoid
problems in the log-normal example (there, it seems we had a too right-skewed
distribution).

Third, we remove the `set.seed()` call because we already have one at the
beginning of the vignette.
… `R CMD check`:

```
process chunk of elements [...] is [...] MiB. This exceeds the maximum allowed
size 500.00 MiB per by R option "future.globals.maxSize". This limit is set to
protect against transfering too large objects to parallel workers by mistake,
which may not be intended and could be costly. See
help("future.globals.maxSize", package = "future") for further explainations and
how to adjust or remove this threshold The three largest globals are 'one_obs'
([...] MiB of class 'function'), 'refmodel' ([...] MiB of class 'list') and 'lw'
([...] MiB of class 'numeric')
```
Hence, switch back to the `doParallel` backend.
(this is only a very basic test for `latent_ll_oscale` because we have more
comprehensive examples with simulated data in the latent-projection vignette and
they are run when `R CMD check` is run)
…unctions

(to check that arguments added after the initial latent-projection release
(v2.4.0) have indeed been added (or taken into account via `...`) by users)
@fweber144
Copy link
Collaborator Author

I just pushed commits completing this PR. From my side, it is ready to be merged, but check it if you like.

I refrained from updating the package description in the DESCRIPTION file because I prefer the general wording that we currently have and because Catalina, Bürkner, and Vehtari (2021, https://doi.org/10.48550/arXiv.2109.04702) already deal with time-to-event models, so the existing reference to that preprint should already be a sufficient reference to time-to-event models.

Note that roxygen2 version 7.3.3 seems to have a bug that replaces cross-links to summary() by cross-links to cmdstanr::summary(). I unstaged these changes after each re-documentation, but this is a bit cumbersome. If you prefer, you can always switch back to roxygen2 v7.3.2 (personally, I have installed v7.3.3 and didn't want to downgrade for this PR).

R CMD check runs through on my machine (with R 4.5.1).

@fweber144 fweber144 marked this pull request as ready for review November 1, 2025 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants