Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Project Overview

This project is an R package that directly queries the OMOP Common Data Model (CDM) to generate descriptive study results. It provides functions for defining settings, running SQL-backed analyses against an OMOP CDM instance, writing results to a database or files, and optionally exploring results in a Shiny app.

Characterization is analysis-first: code should help users produce reliable, transparent descriptive outputs from OMOP data with minimal hidden behavior.

## Primary Goals

- Keep the codebase easy to understand, test, and extend for new contributors.
- Ensure OMOP CDM queries and resulting summaries are correct, reproducible, and explicit.
- Keep the Shiny experience intuitive for non-technical users and lay audiences when viewing produced results.
- Prefer maintainable, modular, and predictable code over clever one-off solutions.

## Folder Structure

- `/R`: Contains R functions for settings construction, analysis orchestration, database I/O, and app launch.
- `/inst`: Contains SQL used to query OMOP CDM tables, configuration files, and example data.
- `/man`: Contains documentation for the project, created using roxygen2.
- `/tests`: Contains testthat unit and integration tests.
- `/vignettes`: Contains R Markdown walkthroughs and usage guidance.

## OMOP CDM Query Principles

- Keep SQL logic deterministic and aligned with the intended OMOP CDM table semantics.
- Prefer explicit cohort, concept, and time-at-risk definitions over implicit defaults.
- Isolate SQL generation and execution from presentation/UI logic.
- Validate required OMOP inputs early with clear, actionable error messages.
- Preserve compatibility with supported database platforms when editing SQL templates.

## Contribution-Friendly Design Principles

- Favor small, single-responsibility functions and modules.
- Keep business/data logic separate from query execution and UI rendering logic.
- Prefer explicit inputs and outputs (clear function signatures, no hidden global dependencies).
- Reuse existing helper functions before adding new abstractions.
- Keep changes minimal and focused; avoid refactoring unrelated code in the same PR.
- If introducing a non-obvious pattern, document the rationale in roxygen or vignette notes.


## Analysis Pipeline Expectations

- Keep settings creation functions explicit, composable, and easy to reason about.
- Ensure orchestration functions call lower-level query helpers in a traceable order.
- Avoid hidden side effects in analysis functions; return or persist outputs consistently.
- Keep naming consistent across settings, SQL outputs, and exported result tables.

## User Experience Standards (Layperson-First)

- Write labels, titles, and help text in plain language; avoid jargon where possible.
- If technical terms are necessary, define them inline (short tooltip/help text).
- Present results in progressive detail: summary first, technical detail on demand.
- Prioritize readability of tables/plots (clear titles, units, legends, and sensible defaults).
- Use consistent naming for concepts across tabs, modules, and documentation.
- Handle empty/invalid/no-data states with clear guidance on what users should do next.
- Prefer predictable interactions over dense control panels.

## Documentation Expectations

- All exported functions must have complete roxygen2 docs with examples when feasible.
- For complex query modules, include a short "how it works" section in code comments or vignettes.
- Update `README.md` or vignettes when user-visible behavior changes.
- Keep terminology in docs aligned with UI text.
- Document key assumptions about OMOP CDM inputs, required tables, and expected output schema.

## Testing Expectations

- Add or update `testthat` tests for any behavior change in computation or data transformation.
- For query logic, test SQL generation paths and result-shaping helpers where feasible.
- For Shiny-related logic, test helper functions and core reactive logic where feasible.
- Cover edge cases: missing values, empty result sets, invalid inputs, and boundary conditions.
- Avoid brittle snapshot-like checks unless output stability is intentional.

## Libraries and Frameworks

- DBI/SqlRender/DatabaseConnector patterns in this codebase: For querying OMOP CDM-compatible databases.
- R shiny: For building the interactive result exploration application.
- roxygen2: For generating documentation from R code comments.
- testthat: For unit testing the package functions.

## Performance and Reliability

- Avoid repeated expensive OMOP queries or computations inside loops and reactive contexts.
- Cache or memoize only when it improves responsiveness and remains easy to reason about.
- Surface failures with actionable error messages for users and developers.
- Do not silently swallow errors that hide data quality or pipeline issues.

## Copilot Guidance for This Repository

- Match existing naming and file organization before proposing new patterns.
- Generate code that is explicit and readable for new contributors.
- Prefer incremental changes over large rewrites.
- Keep query-building and result-transformation logic auditable.
- When adding a new feature, suggest where tests and docs should be updated in the same change.
- Do not introduce new package dependencies unless clearly justified.

## Coding Standards

- We use camelCase in R. Function and variable names all start with lowercase. Package names start with uppercase.
- Function names typically start with a verb. Variable names are typically nouns. Do not encode the data type in the variable names. Also, everything is data, so no need to say that unless unavoidable.
- Place spaces around all infix operators (=, +, -, <-, etc.). The same rule applies when using = in function calls. Always put a space after a comma, and never before (just like in regular English).
- Always indent the code inside curly braces. It’s ok to leave very short statements on the same line.
- Use <-, not =, for assignment.
- When calling a function that has more than one argument, make sure to refer to each argument by name instead of relying on the order of arguments.

## Pull Request Checklist (for contributors and Copilot)

- Is the change understandable by a developer new to the project?
- Are function/module responsibilities clear and focused?
- Does the change preserve correctness of OMOP CDM query behavior?
- Are user-facing labels/messages plain language and consistent?
- Were tests and docs updated for behavior changes?
- Does the UI communicate key results simply before exposing advanced detail?
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: Characterization
Type: Package
Title: Implement Descriptive Studies Using the Common Data Model
Version: 3.0.0
Date: 2026-2-26
Version: 3.0.1
Date: 2026-4-15
Authors@R: c(
person("Jenna", "Reps", , "jreps@its.jnj.com", role = c("aut", "cre")),
person("Patrick", "Ryan", , "ryan@ohdsi.org", role = c("aut")),
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
Characterization 3.0.1
======================
- Fix issue with uploading results into database for shiny viewer (spacing was added to csv and causing issues and continuous covariates that are floats were incorrectly bigints)

Characterization 3.0.0
======================
- Splitting the aggregateCovariates into: riskFactor, targetBaseline and caseSeries to make the inputs clearer.
Expand Down
68 changes: 34 additions & 34 deletions R/Database.R
Original file line number Diff line number Diff line change
Expand Up @@ -77,45 +77,45 @@ createSqliteDatabase <- function(
#'
#' @examples
#'
#' # generate results into resultsFolder
#' conDet <- exampleOmopConnectionDetails()
#' ## generate results into resultsFolder
#' #conDet <- exampleOmopConnectionDetails()
#'
#' tteSet <- createTimeToEventSettings(
#' targetIds = c(1,2),
#' outcomeIds = 3
#' )
#' #tteSet <- createTimeToEventSettings(
#' #targetIds = c(1,2),
#' # outcomeIds = 3
#' # )
#'
#' cSet <- createCharacterizationSettings(
#' timeToEventSettings = tteSet
#' )
#' #cSet <- createCharacterizationSettings(
#' # timeToEventSettings = tteSet
#' #)
#'
#' runCharacterizationAnalyses(
#' connectionDetails = conDet,
#' targetDatabaseSchema = 'main',
#' targetTable = 'cohort',
#' outcomeDatabaseSchema = 'main',
#' outcomeTable = 'cohort',
#' cdmDatabaseSchema = 'main',
#' characterizationSettings = cSet,
#' outputDirectory = file.path(tempdir(),'database')
#' )
#' #runCharacterizationAnalyses(
#' # connectionDetails = conDet,
#' # targetDatabaseSchema = 'main',
#' # targetTable = 'cohort',
#' # outcomeDatabaseSchema = 'main',
#' # outcomeTable = 'cohort',
#' # cdmDatabaseSchema = 'main',
#' # characterizationSettings = cSet,
#' # outputDirectory = file.path(tempdir(),'database')
#' #)
#'
#' # create sqlite database
#' charResultDbCD <- createSqliteDatabase()
#' ## create sqlite database
#' #charResultDbCD <- createSqliteDatabase()
#'
#' # create database results tables
#' createCharacterizationTables(
#' connectionDetails = charResultDbCD,
#' resultSchema = 'main'
#' )
#' ## create database results tables
#' #createCharacterizationTables(
#' # connectionDetails = charResultDbCD,
#' # resultSchema = 'main'
#' # )
#'
#' # insert results
#' insertResultsToDatabase(
#' connectionDetails = charResultDbCD,
#' schema = 'main',
#' resultsFolder = file.path(tempdir(),'database'),
#' includedFiles = c('time_to_event')
#' )
#' ## insert results
#' #insertResultsToDatabase(
#' # connectionDetails = charResultDbCD,
#' # schema = 'main',
#' # resultsFolder = file.path(tempdir(),'database'),
#' # includedFiles = c('time_to_event')
#' #)
#'
#'
#' @export
Expand Down Expand Up @@ -382,7 +382,7 @@ getResultTables <- function() {
# https://github.com/tidyverse/readr/issues/671#issuecomment-300567232
formatDouble <- function(x, scientific = FALSE, ...) {
doubleCols <- vapply(x, is.double, logical(1))
x[doubleCols] <- lapply(x[doubleCols], format, scientific = scientific, ...)
x[doubleCols] <- lapply(x[doubleCols], format, trim = TRUE, scientific = scientific, ...)

return(x)
}
6 changes: 3 additions & 3 deletions R/RunCharacterization.R
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ runCharacterizationAnalyses <- function(
threads = 1,
cohortGenerationThreads = NULL,
nTargetJobs = 1,
minCharacterizationMean = 0.01, # is this global or within cov set?
minCharacterizationMean = 0.001, # is this global or within cov set?
minCovariateCount = 0, # is this global or within cov set?
mode = 'CohortIncidence',
minSMD = 0
Expand Down Expand Up @@ -762,7 +762,7 @@ exportSharedObjects <- function(
data$database_id <- databaseId
data$setting_id <- executionId
utils::write.csv(
x = data,
x = formatDouble(data),
file = file.path(saveLocation, paste0(tablePrefix,'target_settings.csv')),
row.names = FALSE
)
Expand Down Expand Up @@ -791,7 +791,7 @@ exportSharedObjects <- function(
data$database_id <- databaseId
data$setting_id <- executionId
utils::write.csv(
x = data,
x = formatDouble(data),
file = file.path(saveLocation, paste0(tablePrefix,'case_settings.csv')),
row.names = FALSE
)
Expand Down
3 changes: 1 addition & 2 deletions R/ViewShiny.R
Original file line number Diff line number Diff line change
Expand Up @@ -228,8 +228,7 @@ prepareCharacterizationShiny <- function(
}

viewChars <- function(
databaseSettings,
testApp = F
databaseSettings
) {
ensure_installed("OhdsiShinyAppBuilder")

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ targetIds <- c(1,2,4)
riskWindowEnd = 365,
endAnchor = 'cohort start',
covariateSettings = FeatureExtraction::createCovariateSettings(
useDemographicsGender = T,
useDemographicsAge = T,
useDemographicsRace = T
useDemographicsGender = TRUE,
useDemographicsAge = TRUE,
useDemographicsRace = TRUE
)
)

Expand All @@ -72,7 +72,7 @@ targetIds <- c(1,2,4)
riskWindowEnd = 365,
endAnchor = 'cohort start',
covariateSettings = FeatureExtraction::createCovariateSettings(
useConditionOccurrenceLongTerm = T
useConditionOccurrenceLongTerm = TRUE
)
)

Expand Down
6 changes: 3 additions & 3 deletions inst/sql/sql_server/ResultTables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ CREATE TABLE @my_schema.@table_prefixrisk_factor_covariates_continuous (
non_case_count_value bigint,
non_case_min_value float,
non_case_max_value float,
non_case_average_value bigint,
non_case_average_value float,
non_case_standard_deviation float,
non_case_median_value float,
non_case_p_10_value float,
Expand Down Expand Up @@ -190,7 +190,7 @@ CREATE TABLE @my_schema.@table_prefixcase_series_covariates_continuous (
during_count_value bigint,
during_min_value float,
during_max_value float,
during_average_value bigint,
during_average_value float,
during_standard_deviation float,
during_median_value float,
during_p_10_value float,
Expand All @@ -200,7 +200,7 @@ CREATE TABLE @my_schema.@table_prefixcase_series_covariates_continuous (
after_count_value bigint,
after_min_value float,
after_max_value float,
after_average_value bigint,
after_average_value float,
after_standard_deviation float,
after_median_value float,
after_p_10_value float,
Expand Down
66 changes: 33 additions & 33 deletions man/insertResultsToDatabase.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/runCharacterizationAnalyses.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/testthat/setup.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
connectionDetails <- Characterization::exampleOmopConnectionDetails()
connectionDetails <- exampleOmopConnectionDetails()
readr::local_edition(1)
withr::defer(
{
Expand Down
Loading
Loading