adding_data_brief.qmd

# Adding datasets, an introduction

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  results = "asis",
  echo = FALSE,
  message = FALSE,
  warning = FALSE
)

library(traits.build)

my_kable_styling <- util_kable_styling_html
```

```{r, echo=FALSE, results='hide', message=FALSE}
## Loads austraits into global name space
austraits <- austraits:::austraits_5.0.0_lite

schema <- get_schema()
definitions <- austraits$definitions
```

This section gives a brief overview of how to add datasets to the database. For a more detailed guide, see the **Guide to adding data** part of this book, starting with [Tutorial: Adding datasets](tutorial_datasets.html).

All steps must be followed for the automated workflow to proceed without problems.

1.  Install the `{traits.build}` R-package from github
2.  Clone the `traits.build-template` repository from github
3.  For each dataset, create a new branch in the repo, named for the new `dataset_id` in `author_year` format, e.g. `Gallagher_2014`.
4.  Create a new folder within the folder `data` with the name `dataset_id`, e.g. `Gallagher_2014`.
5.  Prepare the file `data.csv` and place it within the new folder.
6.  Prepare the file `metadata.yml` and place it within the new folder.
7.  Run tests on the newly added dataset and correct the `data.csv` and `metadata.yml` files as necessary.
8.  Add the new study into the build framework and rebuild the trait database, by running `build_setup_pipeline()` and `source(build.R)`.

You can then rebuild the database, including the new dataset.

9.  Run quality checks on the newly added dataset and correct the `data.csv` and `metadata.yml` files as necessary.
10.  Generate and proofread a report on the data. In particular, check that numeric trait values fall within a logical range relative to other studies, and that individual trait observations are not unnecessarily excluded because their trait values are unsupported.
11. Return to step 6 if changes are made to the `data.csv` or `metadata.yml` files.
12. Push the GitHub branch to your database repository.

The best place to get started learning how to add datasets is to work through a series of 7 [tutorials](tutorial_dataset_1.html). Each introduces you to specific [traits.build functions](https://traitecoevo.github.io/traits.build/reference/index.html) designed to facilitate the addition of dataset metadata or the metadata formats for specific types of datasets. 

The chapter [Adding datasets, a lengthy guide](tutorial_data_long.html) then offers a comprehensive guide to generating the `data.csv` and `metadata.yml` files and error-checking your results. This document is likely overwhelming until you are familiar with the traits.build workflow and metadata format.

It may also help to download one of the two [sample datasets](https://github.com/traitecoevo/traits.build-template/tree/master/data) to use as a template for your own files and a guide on required content. Or alteratively, to see a greater diversity of dataset styles, look at the [austraits.build repository](https://github.com/traitecoevo/austraits.build/tree/master/data)

You should look at the files in the [config folder](https://github.com/traitecoevo/austraits.build/tree/master/config), particularly the `definitions` file for the list of traits in AusTraits and how trait definitions are formatted.