Skip to content

BioRDM/data-wrangling-with-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data wrangling with R

Showcase notebooks for data cleaning with R inspired by our real data curation work for the project about monitoring COVID virus levels in wastewater around Scotland see project page.

The notebooks capture the typical workflows for data curation in biomedical data (probably also suitable any other discipline where data are entered manually into spreadsheets).

The folder structure is (out and data are not under version control):

- data - curated input data
- out - generated different forms of data 
- raw_data - input data (COVID prevalence, sites coordinates and population)
- src - RMarkdown notebooks for data cleaning
  • data-wrangling-01 - renaming columns name to follow consistent naming convention and follow tidy tables recommendations

  • data-wrangling-02 - finding misspelled entries, checking data sanity (number of records per site, negative values etc), replace misspelled entries, changing data format to ISO, adding missing entries

  • data-wrangling-03 - preparing data from a source to join(prevalence with sites locations)

  • data-wrangling-04 - curating population data

  • data-wrangling-05 - joining the two sites tables (populatin and coordination) into one

  • data-wrangling-06 - merging the main covid data with coordinates and population (denormalization of the data)

  • data-wrangling-07 - creation of timeseries data, wide table with a column for each date

  • data-wrangling-08 - creation of aggregated timeseries data (per week number)

  • data-wrangling-09 - preparation of breakpoints for a color scale to be used with a heatmap

  • data-wrangling-10 - generating heatmaps with virus levels per date

About

Showcase notebooks for data cleaning with R

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages