OptSurvCutR (Optimal Survival Cut-points) is an R package for optimising cut-points in survival analysis, designed for biostatisticians analysing time-to-event data with continuous predictors (e.g., virome abundances in TCGA datasets). It provides a robust workflow to determine the optimal number and location of cut-points, moving beyond median splits to capture non-linear relationships (e.g., U-shaped effects).
- Beyond Median Splits: Identifies the optimal number and location of cut-points using AIC, AICc, or BIC, revealing complex predictor effects.
- Complete Workflow: Integrates
find_cutpoint_number(),find_cutpoint(), andvalidate_cutpoint()for end-to-end analysis. - Flexible Algorithms: Offers systematic grid search and genetic algorithms (via
rgenoud) for efficient multi-cut optimisation. - Robust Validation: Assesses cut-point stability using bootstrap resampling, providing 95% confidence intervals to gauge reliability.
- User-Friendly: Provides clear S3 methods (
print,summary,plot) for easy interpretation of results.
You can install the development version of OptSurvCutR from GitHub. Note that the genetic algorithm (method = "genetic") requires the rgenoud package, which should be installed separately from CRAN if you plan to use it.
# Install dependencies
install.packages(c("remotes", "rgenoud", "survival"))
# Install OptSurvCutR
remotes::install_github("paytonyau/OptSurvCutR")Here is a short example demonstrating the core workflow using the built-in colorectal cancer virome dataset.
# Load necessary packages
library(OptSurvCutR); library(dplyr); library(survival)
# --- 1. Load and prepare the built-in CRC dataset ---
data("crc_virome", package = "OptSurvCutR")
# A quick preparation to make the status column numeric (0=LIVING, 1=DECEASED)
crc_data <- crc_virome %>%
select(
time = time_months,
status_char = status,
Enterovirus
) %>%
mutate(
status = as.numeric(substr(status_char, 1, 1))
) %>%
# Remove any rows with missing data
na.omit()
# --- 2. Find the optimal NUMBER of cut-points ---
# We will test for 0, 1, or 2 cuts using a fast systematic search
number_result <- find_cutpoint_number(
data = crc_data,
predictor = "Enterovirus",
outcome_time = "time",
outcome_event = "status",
method = "systematic", # "systematic" is fast for a README
max_cuts = 2,
nmin = 0.15, # Ensure groups have at least 15% of subjects
seed = 42
)
print(number_result)
# The BIC suggests 2 cut-points are optimal for this data.
# --- 3. Find the optimal VALUE of those cut-points ---
# We will find the locations for the 2 optimal cuts
cutpoint_result <- find_cutpoint(
data = crc_data,
predictor = "Enterovirus",
outcome_time = "time",
outcome_event = "status",
num_cuts = 2, # Use the result from the step above
method = "systematic",
nmin = 0.15,
seed = 123
)
# --- 4. (Optional) Validate cut-point stability ---
# This step runs a bootstrap and can take a few minutes.
# It is recommended for a full analysis but can be skipped for a quick check.
validation_result <- validate_cutpoint(
cutpoint_result = cutpoint_result,
num_replicates = 25, # Use >= 500 for a real analysis
seed = 456
)
summary(validation_result)
# --- 5. Visualise the Result ---
# The plot reveals three distinct risk groups (Low, Medium, High)
# based on Enterovirus abundance.
# We plot the 'validation_result', which shows the survival curves
# using the original optimal cuts found in step 3.
plot(validation_result, type = "outcome")OptSurvCutR provides a three-step workflow for cut-point analysis:
find_cutpoint_number(): Determines the statistically optimal number of cut-points using information criteria (AIC, AICc, or BIC).find_cutpoint(): Identifies the precise cut-point locations using systematic or genetic algorithms, optimising a chosen survival metric (log-rank, HR, p-value).validate_cutpoint(): Assesses the stability of the identified cut-points via bootstrap resampling, providing 95% confidence intervals.
- Vignettes: See browseVignettes("OptSurvCutR") for detailed tutorials, including analyses of the germination and crc_virome datasets.
- Package Website: Full function documentation and articles available at https://paytonyau.github.io/OptSurvCutR/ (or run pkgdown::build_site() locally).
- Manuscript: Read the accompanying paper for methodological details and further case studies: Yau, Payton T. O. "OptSurvCutR: Validated Cut-point Selection for Survival Analysis." bioRxiv preprint, posted October 10, 2025. https://doi.org/10.1101/2025.10.08.681246.
- NEWS.md: See NEWS.md file for recent changes and version history.
- Code of Conduct: Please note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
If you use OptSurvCutR in your research, please cite the accompanying manuscript:
@Article{,
author = {Payton T. O. Yau},
title = {OptSurvCutR: Validated Cut-point Selection for Survival Analysis},
year = {2025},
doi = {10.1101/2025.10.08.681246},
publisher = {Cold Spring Harbor Laboratory},
url = {[https://www.biorxiv.org/content/10.1101/2025.10.08.681246](https://www.biorxiv.org/content/10.1101/2025.10.08.681246)},
journal = {bioRxiv}
}A JOSS submission is planned post-rOpenSci review.
If you find OptSurvCutR helpful in your survival analysis research, please consider supporting its ongoing development/maintenance without any dedicated funding. Your contribution, big or small, directly helps dedicate more time to keeping the project alive and improving.
Licensed under the GPL-3 License.
For questions or feedback, open an issue at GitHub Issues.