Data quality rules

I'm not sure there is a place yet for data quality rules but, out of existing repositories, this seems the closest home for defining the a structure for data quality rules.

Ultimately it would be useful for the user to provide a list of data quality rules (from a pre-existing/pre-defined ruleset or custom rules) in a human-readable format, with a well-defined syntax that allows the user to define custom rules (and error messages or actions). Alternatively this could mirror REDCap's syntax for skip logic? 

I think this would take some work and I think the data quality pipeline that has been implemented for the ISARIC global dengue database is very valuable, but it should be designed to be more consistent/scalable than ad-hoc project-specific Python scripts.

Also some suggestions for pre-defined data quality rules:
- pregnancy related
- integer-valued severity scores are integer-valued
- clinical scores are consistent with raw values (quite complex)
- estimated total number of lesions should be higher than sum of lesion counts
- compatibility between different ways of asking the same question e.g. medications, invasive ventilation
- all blood cell count data should make sense, e.g. neutrophil / lymphocyte / eosinophil < WBC



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data quality rules #17

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Data quality rules #17

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions