Initialize intercept as logit of outcome rate for logistic regression models

Hi @msuchard ,

I believe currently the intercept is initialized as zero (at least for logistic regression, only model I checked). Given we are often dealing with rare outcomes I think it would be better to use the `logit` of the outcome rate ( `log(p / (1 - p))` ).  In this case the model converges faster (as measured in fever iterations, see below) and seems to give the same results. I do think this could end up giving ~ 20-30% speed up for outcome rates of 1-3% which are common in our data. The example below for Eunomia gives a speedup of over 2x with outcome rate of 2%, but it's rather small data (n=1800, p=372).  There's also a pure `Cyclops` reprex at the bottom giving about 30% fewer iterations but I had to change the `simulateCyclopsData` to give 1-3% outcome rates.



Of course I could supply `startingCoefficients` myself from the PLP to add this. But if there are no downsides maybe this should be the default in `Cyclops` ?  What do you think ?

reprex:
``` r
library(PatientLevelPrediction)
    library(Cyclops)
    
    connectionDetails <- Eunomia::getEunomiaConnectionDetails()
#> attempting to download GiBleed
#> attempting to extract and load: /tmp/RtmpplDETg/GiBleed_5.3.zip to: /tmp/RtmpplDETg/GiBleed_5.3.sqlite
    Eunomia::createCohorts(connectionDetails)
#> Cohorts created in table main.cohort
#>   cohortId       name
#> 1        1  Celecoxib
#> 2        2 Diclofenac
#> 3        3    GiBleed
#> 4        4     NSAIDs
#>                                                                                        description
#> 1    A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
#> 2    A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
#> 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
#> 4       A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
#>   count
#> 1  1844
#> 2   850
#> 3   479
#> 4  2694
    outcomeId <- 3 # GIbleed
    seed <- 42
    
    databaseDetails <- createDatabaseDetails(
        connectionDetails = connectionDetails,
        cdmDatabaseSchema = "main",
        cdmDatabaseName = "main",
        cohortDatabaseSchema = "main",
        cohortTable = "cohort",
        outcomeDatabaseSchema = "main",
        outcomeTable =  "cohort",
        targetId = 1,
        outcomeIds = outcomeId,
        cdmVersion = 5)
#> No cdm database id entered so using cdmDatabaseSchema - if cdmDatabaseSchema is the same for multiple different databases, please use cdmDatabaseId to specify a unique identifier for the database and version
    
    covariateSettings <- FeatureExtraction::createDefaultCovariateSettings()
    
    plpData <- getPlpData(databaseDetails = databaseDetails,
                          covariateSettings = covariateSettings,
                          restrictPlpDataSettings = createRestrictPlpDataSettings())
#> Connecting using SQLite driver
#> 
#> Constructing the at risk cohort
#>   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.026 secs
#> Fetching cohorts from server
#> Loading cohorts took 0.031 secs
#> Constructing features on server
#>   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |=========                                                             |  14%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |==============                                                        |  21%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  39%  |                                                                              |============================                                          |  40%  |                                                                              |============================                                          |  41%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |========================================                              |  58%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |==========================================                            |  61%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |============================================                          |  64%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |===============================================                       |  68%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |=================================================                     |  71%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |======================================================                |  78%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  79%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |==================================================================    |  95%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
#> Executing SQL took 3.26 secs
#> Fetching data from server
#> Fetching data took 0.312 secs
#> Fetching outcomes from server
#> Loading outcomes took 0.0241 secs
    
    population <- createStudyPopulation(
        plpData,
        outcomeId = outcomeId,
        populationSettings = createStudyPopulationSettings(
            minTimeAtRisk = 1,
            riskWindowEnd = 15) # short TAR end for lower outcome rate
    )
#> outcomeId: 3
#> binary: TRUE
#> includeAllOutcomes: TRUE
#> firstExposureOnly: FALSE
#> washoutPeriod: 0
#> removeSubjectsWithPriorOutcome: TRUE
#> priorOutcomeLookback: 99999
#> requireTimeAtRisk: TRUE
#> minTimeAtRisk: 1
#> restrictTarToCohortEnd: FALSE
#> riskWindowStart: 1
#> startAnchor: cohort start
#> riskWindowEnd: 15
#> endAnchor: cohort start
#> restrictTarToCohortEnd: FALSE
#> Removing subjects with prior outcomes (if any)
#> Removing non outcome subjects with insufficient time at risk (if any)
#> Outcome is 0 or 1
#> Population created with: 1800 observations, 1800 unique subjects and 38 outcomes
#> Population created in 0.044 secs
    
    population$y <- population$outcomeCount
    plpData$covariateData$outcomes <- population
    cyclopsData <- convertToCyclopsData(plpData$covariateData$outcomes,
                                        plpData$covariateData$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
    
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    
    control <- createControl(noiseLevel = "quiet",
                             cvRepetitions = 1,
                             tolerance = 1e-9,
                             seed = seed)
    
    out <- capture.output(fit <- fitCyclopsModel(cyclopsData, prior, control))
    
    
    
    # Now fit the same model with starting coefficients
    cyclopsData <- convertToCyclopsData(plpData$covariateData$outcomes,
                                        plpData$covariateData$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    control <- createControl(noiseLevel = "quiet",
                             cvRepetitions = 1,
                             seed = seed,
                             tolerance = 1e-9)
    ncovars <- length(summary(cyclopsData)$covariateId)
    startingCoefficients <- rep(0, ncovars)
    outcomeRate <- mean(population$y)
    startingCoefficients[1] <- log(outcomeRate / (1 - outcomeRate))
    out2 <- capture.output(fit2 <- fitCyclopsModel(cyclopsData,
                                                   prior,
                                                   control,
                                                   startingCoefficients = startingCoefficients))
    message("Original Cyclops fit complete in ", round(fit$timeFit, 4), " seconds. And ",
            fit$iterations, " iterations. With logLikelihood: ", fit$log_likelihood)
#> Original Cyclops fit complete in 0.018 seconds. And 125 iterations. With logLikelihood: -166.622177178957
    
    message("Modified Cyclops fit complete in ", round(fit2$timeFit, 4), " seconds. And ",
            fit2$iterations, " iterations. With logLikelihood: ", fit2$log_likelihood)
#> Modified Cyclops fit complete in 0.0077 seconds. And 67 iterations. With logLikelihood: -166.622177181422
```

<sup>Created on 2025-01-29 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>


Pure `Cyclops` reprex:
``` r
library(Cyclops)
    
    # modify https://github.com/OHDSI/Cyclops/blob/develop/R/Simulation.R#L80C1-L80C63
    # strataBackgroundProb <- runif(nstrata,min=0.01,max=0.03)
    
    seed <- 666
    set.seed(seed)
    
    nrows <- 10000
    ncovars <- 500
    
    data <- simulateCyclopsData(nstrata = 1,
                                nrows = nrows,
                                ncovars = ncovars,
                                model = "logistic")
#> Sparseness = 98.99538 %
    
    cyclopsData <- convertToCyclopsData(data$outcomes,
                                        data$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
#> Sorting covariates by covariateId and rowId
    
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    
    control <- createControl(noiseLevel = "quiet",
                             cvType = "auto",
                             cvRepetitions = 1,
                             seed = seed)
    out <- capture.output(fit <- fitCyclopsModel(cyclopsData, prior, control))
    
    
    # Now fit the same model with starting coefficients
    cyclopsData <- convertToCyclopsData(data$outcomes,
                                        data$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE)
#> Sorting covariates by covariateId and rowId
    prior <- createPrior("laplace", exclude = c(0), useCrossValidation = FALSE)
    control <- createControl(noiseLevel = "quiet",
                             cvRepetitions = 1,
                             seed = seed)
    startingCoefficients <- rep(0, ncovars + 1)
    outcomeRate <- mean(data$outcomes$y)
    startingCoefficients[1] <- log(outcomeRate / (1 - outcomeRate))
    out2 <- capture.output(fit2 <- fitCyclopsModel(cyclopsData,
                                                   prior,
                                                   control,
                                                   startingCoefficients = startingCoefficients))
    
    message("Original Cyclops fit complete in ", round(fit$timeFit, 4), " seconds. And ",
            fit$iterations, " iterations. With logLikelihood: ", fit$log_likelihood)
#> Original Cyclops fit complete in 0.0686 seconds. And 17 iterations. With logLikelihood: -756.202262114616
    
    message("Modified Cyclops fit complete in ", round(fit2$timeFit, 4), " seconds. And ",
            fit2$iterations, " iterations. With logLikelihood: ", fit2$log_likelihood)
#> Modified Cyclops fit complete in 0.0472 seconds. And 12 iterations. With logLikelihood: -756.202248337887
```

<sup>Created on 2025-01-29 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize intercept as logit of outcome rate for logistic regression models #81

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Initialize intercept as logit of outcome rate for logistic regression models #81

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions