-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #388: Refactor preprocessing functionality #390
Conversation
08240f0
to
0e809ce
Compare
This comment was marked as outdated.
This comment was marked as outdated.
044a9bd
to
93a4d61
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
6ad4d9c
to
5385ca5
Compare
Note that actually brms::stanvar(
block = "data",
scode = "array[N - wN] int noverlap;",
x = filter(data, woverlap == 0)$row_id,
name = "noverlap"
) + |
As it's a transformed variable I think we can create it in latent individual. As it's an internal variable we might want to use .row_id to avoid closes with user variables |
Finding a pain point here is changing all the tests to use a date version of simulated data (knowing that probably we will need to revert this as we add functionality to have non-date data). Makes me think of having some temporary way to class data that's already in the right format with I think solution is just a function like: as_epidist_linelist_time <- function(data) {
class(data) <- c("epidist_linelist", class(data))
epidist_validate_data(data)
return(data)
} |
7c67a82
to
3728970
Compare
Sure but just as a helper function in test that? |
It's currently a regular packaged. It'll need to be used in some of the vignettes as well so would prefer to keep it as a packaged function. We could also consider putting it as an internal function and using |
Note that I think we are going to need this functionality eventually anyway, so if this is something we want to think about and implement here I could be open to it. |
We can't use an internal function in a vignette as it isn't CRAN complaint or good practice. My suggestion to use as a testing function is that we can implement it here quickly and think about the design in another issue as I don't think we want to rush it. My preference would be this is part of the simulation workflow and likely works via adding dates to simulated times but as above I think it needs some thought |
In that case I'm in favour of adding it as an exported function (as is currently) so that it can be used in the vignettes. Then that would give us chance to:
|
… included in package soon
47056eb
to
5ae1962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
I think I would have just a single validate function and dispatch into it from models and data. In the future we should also update to use modification validation vs entry to use validation. We should also add a new_epidist_linelist method in the future as well.
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: c573ba836b76170c03d5c493cbb378781db5fa23 [formerly bac50e38d758dfe0fdcfd98722dc50a5a98c0357] Former-commit-id: 4e9d3ee55e7e0c90bd35366990f38aa4894b3439
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: db76be3 Former-commit-id: a92a3d606c2c51fcb2ae0fb6a4c5a4db630e668b
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: db76be3 Former-commit-id: a92a3d606c2c51fcb2ae0fb6a4c5a4db630e668b
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: db76be3 Former-commit-id: a92a3d606c2c51fcb2ae0fb6a4c5a4db630e668b
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: c573ba836b76170c03d5c493cbb378781db5fa23 [formerly bac50e38d758dfe0fdcfd98722dc50a5a98c0357] Former-commit-id: 4e9d3ee55e7e0c90bd35366990f38aa4894b3439
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: c573ba836b76170c03d5c493cbb378781db5fa23 [formerly bac50e38d758dfe0fdcfd98722dc50a5a98c0357] Former-commit-id: 4e9d3ee55e7e0c90bd35366990f38aa4894b3439 Former-commit-id: 84a5299
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: db76be3 Former-commit-id: a92a3d606c2c51fcb2ae0fb6a4c5a4db630e668b Former-commit-id: 4172d205511eeb9368959a38bf52395faa2ad703 [formerly 7296830] Former-commit-id: 90bee30a3d61abb6c565551f0fd66b09146aa790
* Sketching out refactor * Some restructure for clarity * Refactor along new date approach * Clear tests * Refactor lines to save col_names * Refactor validation functionality * Redocument * Remove epidist_validate and move to _model and _data approach plus some linting * Add documentation of as_epidist_linelist arguments * Move assert_class into imports and use in place of "check" class * Documentation for epidist_validate_data.epidist_linelist * Clear up the direct model file a bit * Add creating the row_id back in to as_latent_individual * Passing test-direct_model * Start working to make data use dates * Add start of unit tests and bug fix for datetime class check * Use .row_id rather than row_id * Use as_epidist_linelist_time function so that tests work with time data * Fixes to tests * Group into preprocessing functions * Update FAQ vignette to run * Update get started vignette to run * Update ebola vignette to run * Update approximate inference vignette to run * Add documentation * Methods consistency * Document ... * Again on ... * Remove comment moved to issue * Include as_epidist_linelist_time ad-hoc * Add test for datetime column * Update text in vignettes and add note about the ad-hoc function being included in package soon * Refactor .rename_columns Former-commit-id: db76be3 Former-commit-id: a92a3d606c2c51fcb2ae0fb6a4c5a4db630e668b Former-commit-id: 4172d205511eeb9368959a38bf52395faa2ad703 [formerly 7296830] Former-commit-id: 90bee30a3d61abb6c565551f0fd66b09146aa790
Description
Important
Current review ask ready for review
This PR will close #388.
Summary of what has been done:
epidist_linelist
classas_epidist_linelist
as date based entry point toepidist_linelist
classas_epidist_linelist_time
as temporary time based entry point toepidist_linelist
classepidist_validate_data
for data classes (likeepidist_linelist
) andepidist_validate_model
for model classes (likelatent_individual
)Summary of remains to do:
Summary of new issues / extensions:
as_epidist_linelist
#415as_epidist_linelist
could accept a user not passing an upper bound (and could assume that no upper bound means daily censoring)epidist_linelist
epidist_linelist
including a better thought out time entry pointepidist_linelist
doesn't include a column for the "case" -- should we enforce that users provide this? It could be useful to have to produce plots and thingsas_latent_individual
#416 RIght nowas_latent_individual
does some making new columns. Can we refactor these into helpers as was original plan?epidist_linelist
#402 In the class constructor fordirect_model
they are calledptime
andstime
and use adata.frame
without going viaepidist_linelist
-- need to sort this out to go viadirect_model
Checklist