Skip to content

Data quality rules #17

@tedinburgh

Description

@tedinburgh

I'm not sure there is a place yet for data quality rules but, out of existing repositories, this seems the closest home for defining the a structure for data quality rules.

Ultimately it would be useful for the user to provide a list of data quality rules (from a pre-existing/pre-defined ruleset or custom rules) in a human-readable format, with a well-defined syntax that allows the user to define custom rules (and error messages or actions). Alternatively this could mirror REDCap's syntax for skip logic?

I think this would take some work and I think the data quality pipeline that has been implemented for the ISARIC global dengue database is very valuable, but it should be designed to be more consistent/scalable than ad-hoc project-specific Python scripts.

Also some suggestions for pre-defined data quality rules:

  • pregnancy related
  • integer-valued severity scores are integer-valued
  • clinical scores are consistent with raw values (quite complex)
  • estimated total number of lesions should be higher than sum of lesion counts
  • compatibility between different ways of asking the same question e.g. medications, invasive ventilation
  • all blood cell count data should make sense, e.g. neutrophil / lymphocyte / eosinophil < WBC

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions