-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathadding_data_brief.qmd
52 lines (38 loc) · 3.3 KB
/
adding_data_brief.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Adding datasets, an introduction
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
results = "asis",
echo = FALSE,
message = FALSE,
warning = FALSE
)
library(traits.build)
my_kable_styling <- util_kable_styling_html
```
```{r, echo=FALSE, results='hide', message=FALSE}
## Loads austraits into global name space
austraits <- austraits:::austraits_5.0.0_lite
schema <- get_schema()
definitions <- austraits$definitions
```
This section gives a brief overview of how to add datasets to the database. For a more detailed guide, see the **Guide to adding data** part of this book, starting with [Tutorial: Adding datasets](tutorial_datasets.html).
All steps must be followed for the automated workflow to proceed without problems.
1. Install the `{traits.build}` R-package from github
2. Clone the `traits.build-template` repository from github
3. For each dataset, create a new branch in the repo, named for the new `dataset_id` in `author_year` format, e.g. `Gallagher_2014`.
4. Create a new folder within the folder `data` with the name `dataset_id`, e.g. `Gallagher_2014`.
5. Prepare the file `data.csv` and place it within the new folder.
6. Prepare the file `metadata.yml` and place it within the new folder.
7. Run tests on the newly added dataset and correct the `data.csv` and `metadata.yml` files as necessary.
8. Add the new study into the build framework and rebuild the trait database, by running `build_setup_pipeline()` and `source(build.R)`.
You can then rebuild the database, including the new dataset.
9. Run quality checks on the newly added dataset and correct the `data.csv` and `metadata.yml` files as necessary.
10. Generate and proofread a report on the data. In particular, check that numeric trait values fall within a logical range relative to other studies, and that individual trait observations are not unnecessarily excluded because their trait values are unsupported.
11. Return to step 6 if changes are made to the `data.csv` or `metadata.yml` files.
12. Push the GitHub branch to your database repository.
The best place to get started learning how to add datasets is to work through a series of 7 [tutorials](tutorial_dataset_1.html). Each introduces you to specific [traits.build functions](https://traitecoevo.github.io/traits.build/reference/index.html) designed to facilitate the addition of dataset metadata or the metadata formats for specific types of datasets.
The chapter [Adding datasets, a lengthy guide](tutorial_data_long.html) then offers a comprehensive guide to generating the `data.csv` and `metadata.yml` files and error-checking your results. This document is likely overwhelming until you are familiar with the traits.build workflow and metadata format.
It may also help to download one of the two [sample datasets](https://github.com/traitecoevo/traits.build-template/tree/master/data) to use as a template for your own files and a guide on required content. Or alteratively, to see a greater diversity of dataset styles, look at the [austraits.build repository](https://github.com/traitecoevo/austraits.build/tree/master/data)
You should look at the files in the [config folder](https://github.com/traitecoevo/austraits.build/tree/master/config), particularly the `definitions` file for the list of traits in AusTraits and how trait definitions are formatted.