Skip to content

Commit

Permalink
Add GitHub chapter (#17)
Browse files Browse the repository at this point in the history
* add github chapter
* add versioned_releases chapter
---------

Co-authored-by: ehwenk <[email protected]>
  • Loading branch information
yangsophieee and ehwenk authored Dec 14, 2023
1 parent 51b8de8 commit 7b5518c
Show file tree
Hide file tree
Showing 13 changed files with 226 additions and 124 deletions.
8 changes: 5 additions & 3 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ project:
book:
title: "The {traits.build} data standard, R package, and workflow"
author: ["Elizabeth Wenk", "Daniel Falster", "Sophie Yang", "Fonti Kar"]
page-footer: "How to get help: <https://traitecoevo.github.io/traits.build-book/help.html><br>Copyright 2023, Daniel Falster and Elizabeth Wenk"
page-footer: "How to get help: <https://traitecoevo.github.io/traits.build-book/help.html><br>Copyright 2023, Daniel Falster and Elizabeth Wenk"
page-navigation: true
chapters:
- part: Introduction
Expand All @@ -13,7 +13,7 @@ book:
- motivation.qmd
- workflow.qmd
- usage_examples.qmd
- part: Data structure and standard
- part: Data structure and standard
chapters:
- long_wide.qmd
- database_standard.qmd
Expand Down Expand Up @@ -41,6 +41,8 @@ book:
- adding_data_long.qmd
- check_dataset_functions.qmd
- data_common_issues.qmd
- github.qmd
- versioned_releases.qmd
- part: Using outputs of `traits.build`
chapters:
- austraits_database.qmd
Expand All @@ -56,7 +58,7 @@ book:
appendices:
- csv.qmd
- yaml.qmd

format:
html:
theme: cosmo
Expand Down
4 changes: 2 additions & 2 deletions check_dataset_functions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ dataset_check_outlier_by_species <- function(database, dataset, trait, multiplie
comparisons) %>%
dplyr::filter(as.numeric(value) > multiplier*mean_value | as.numeric(value) < (1/multiplier)*mean_value) %>%
dplyr::mutate(value_ratio = as.numeric(value)/mean_value) %>%
dplyr::arrange(value_ratio)
dplyr::arrange(dplyr::desc(value_ratio))
need_review
Expand Down Expand Up @@ -213,7 +213,7 @@ dataset_check_outlier_by_genus <- function(database, dataset, trait, multiplier)
comparisons) %>%
dplyr::filter(as.numeric(value) > multiplier*mean_value | as.numeric(value) < (1/multiplier)*mean_value) %>%
dplyr::mutate(value_ratio = as.numeric(value)/mean_value) %>%
dplyr::arrange(value_ratio)
dplyr::arrange(dplyr::desc(value_ratio))
need_review
Expand Down
14 changes: 7 additions & 7 deletions database_structure.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ elements <- schema$austraits$elements$excluded_data
writeLines(c(""))
```

## taxa
## Taxa

**Description:** A table containing details on taxa that are included in the table [`traits`](#traits). We have attempted to align species names with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI); the sourced information is released under a CC-BY3 license.

Expand All @@ -256,7 +256,7 @@ elements <- schema$austraits$elements$taxa
writeLines(c(""))
```

## taxonomic_updates
## Taxonomic_updates

```{r}
elements <- schema$austraits$elements$taxonomic_updates
Expand All @@ -273,7 +273,7 @@ elements <- schema$austraits$elements$taxonomic_updates

Both the original and the updated taxon names are included in the [`traits`](#traits) table.

## definitions
## Definitions

```{r}
elements <- schema$austraits$elements$definitions
Expand All @@ -300,7 +300,7 @@ for (trait in c("leaf_mass_per_area", "woodiness")) {
}
```

## contributors
## Contributors

```{r}
elements <- schema$austraits$elements$contributors
Expand All @@ -315,7 +315,7 @@ elements <- schema$austraits$elements$contributors
writeLines(c(""))
```

## sources
## Sources

For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is defined as, `r austraits$schema$metadata$elements$source$values$primary$description` The secondary citation is defined as, `r austraits$schema$metadata$elements$source$values$secondary$description`

Expand All @@ -334,7 +334,7 @@ austraits$sources["Falster_2005_1"]

A formatted version of the sources also exists within the table [methods](#methods).

## metadata
## Metadata

```{r}
elements <- schema$austraits$elements$metadata
Expand All @@ -346,7 +346,7 @@ elements <- schema$austraits$elements$metadata
writeLines(c(""))
```

## build_info
## Build_info

```{r}
elements <- schema$austraits$elements$build_info
Expand Down
61 changes: 61 additions & 0 deletions github.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Using GitHub

## Working with your GitHub repository

For {traits.build} users, the preferred way of hosting your database is on GitHub.

### Setting up the repository

There are some GitHub settings we recommend:
- `General`: Enable "Always suggest updating pull request branches" to keep the branch up to date with the main branch before merging
- `General`: Enable "Automatically delete head branches" to delete the branch after merging, which keeps your branches clean
- `Branches`: Add a branch protection rule for your main or develop branch and enable "Require a pull request before merging", "Require conversation resolution before merging", "Require deployments to succeed before merging"

#### Automated tests during pull requests

To run automated tests that must pass before a pull request can be merged in, you can set up GitHub workflows via the Actions tab on GitHub. The setting "Require deployments to succeed before merging" must be enabled for the `main` or `develop` branch. You can write your own workflows which are stored in `.github/workflows/`. For {austraits.build}, the GitHub workflow runs `dataset_test` on all data sources and compiles the database (see [here](https://github.com/traitecoevo/austraits.build/blob/51964dbe4d302c6dade51db133e9e32514cddaae/.github/workflows/check-build.yml)).


### Adding to the repository

New data can be added to the repository by creating a branch and then opening a [pull request](https://help.github.com/articles/using-pull-requests/) (PR). Those who want to contribute but aren't approved maintainers of the database, must fork and clone the database from GitHub.

In short,

1. Create a Git branch for your new work, either within the repo (if you are an approved contributor) or as a [fork of the repo](https://help.github.com/en/github/getting-started-with-github/fork-a-repo).
2. Make commits and push these up onto the branch.
3. Make sure everything runs fine before you send a PR (see [tutorials for adding datasets](tutorial_datasets.html)).
4. Submit the PR and tag someone as a reviewer.
5. Squash and merge the PR once approved and any changes have been made.

**Tip**: For working with git and GitHub, we recommend GitHub Desktop, a user-friendly graphical interface tool.

#### Merging a pull request

The easiest way to merge a PR is to use GitHub's built-in options for squashing and merging. This leads to:

- A single commit
- The work is attributed to the original author

You can merge in a PR after it has been approved. To merge a PR, you need to be an approved maintainer. You do not need to be the original author of the PR (the commit will still be by the original author).

1. Send the PR.
2. Tag someone to review.
3. If there are any updates to the main branch, merge those into your new branch and resolve any conflicts.
4. Once ready, merge into the main branch, choosing "Squash & Merge", using an informative commit message. "Squash" merges all your commits on the branch into one.

##### Commit messages

Informative commit messages are ideal. They should clearly describe the work done and value added to the database in a few, clear, bulleted points. If relevant, they should reference any GitHub issues. You can [link to and directly close GitHub issues via the commit message](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword). To link to another commit you can also use the SHA-hash or its 7-character prefix.

An example commit message:

```
Smith_1996: Add study
- For #224, closes #286
- Trait data for Nothofagus forests across Australia, New Zealand and South America
```

## Bugs and feature requests for {traits.build}

If you find a bug or have a feature request for {traits.build}, [file a GitHub issue](https://github.com/traitecoevo/traits.build/issues) on {traits.build}. Illustrate the bug with a minimal [reprex](https://www.tidyverse.org/help/#reprex) (reproducible example). Please feel free to contribute by implementing the fix or feature via pull request. For substantial pull requests, it is best to first check with the {traits.build} team that it's worth pursuing the problem.
2 changes: 1 addition & 1 deletion traits_build.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The core components of the `{traits.build}` package are:

1. 15 functions functions, supplemented by a detailed [protocol](tutorial_datasets.html) to wrangle diverse datasets into input files with a common structure that captures both the trait data and all essential metadata and context properties. These are a table (data.csv) containing all trait data, taxon names, location names (if relevant), and any context properties (if relevant) and a structured metadata file (metadata.yml) that assigns the columns from the `data.csv` file to their specific variables and maps all additional dataset metadata in a structured format.

2. An R-based pipeline to combine the input files into a single harmonised database with aligned trait names, aligned units, aligned categorical trait values, and aligned taxon names. Four dataset-specific configuration files are required for the build process, 1) a trait dictionary; 2) a units conversion file; 3) a taxon list; and 4) a database metadata file.
2. An R-based pipeline to combine the input files into a single harmonised database with aligned trait names, aligned units, aligned categorical trait values, and aligned taxon names. Four database-specific configuration files are required for the build process, 1) a trait dictionary; 2) a units conversion file; 3) a taxon list; and 4) a database metadata file.

Guided by the information in the configuration files, the R-scripted workflow combines the `data.csv` and `metadata.yml` files for the individual datasets into a unified, harmonised database. There are three distinct steps to this process, processed by a trio of functions, `dataset_configure`, `dataset_process`, and `dataset_taxonomic_updates`. These functions cyclically build each dataset, only combining them into a single database at the end of the workflow.

Expand Down
64 changes: 32 additions & 32 deletions tutorial_dataset_1.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Before you begin this tutorial, ensure you have installed traits.build, cloned t

- Learn how to [merge a new dataset](#build_pipline) into a `traits.build` database.

### New Functions Introduced
### New functions introduced

- metadata_create_template

Expand Down Expand Up @@ -40,7 +40,7 @@ In the traits.build-template repository, there is a folder titled `tutorial_data

- There is a folder `raw` nested within the `tutorial_dataset_1` folder, that contains one file, `notes.txt`

### source necessary functions
### Source necessary functions

- Source the functions in the `traits.build` package:

Expand Down Expand Up @@ -140,7 +140,7 @@ A follow-up question then allows you to add a fixed `collection_date` as a range

[Enter collection_date range in format '2007/2009':]{style="color:blue;"} [**2002-11/2002-11**]{style="color:red;"}\

A final user prompt asks if, for any traits, a sequence of rows represents repeat observations.\
A final user prompt asks if, for any traits, a sequence of rows represents repeat observations.\

[Do all traits need `repeat_measurements_id`'s?]{style="color:blue;"}

Expand Down Expand Up @@ -168,7 +168,7 @@ metadata_add_source_doi(dataset_id = "tutorial_dataset_1", doi = "10.1111/j.0022
The following information is automatically propagated into the source field:

```{r, eval=FALSE}
primary:
primary:
key: Test_1
bibtype: Article
year: '2005'
Expand Down Expand Up @@ -286,30 +286,30 @@ You select columns 3, 4, 5, as these contain trait data.

```{r, eval=FALSE}
traits:
- var_in: LMA (mg mm-2)
unit_in: unknown
trait_name: unknown
entity_type: unknown
value_type: unknown
basis_of_value: unknown
replicates: unknown
methods: unknown
- var_in: Leaf nitrogen (mg mg-1)
unit_in: unknown
trait_name: unknown
entity_type: unknown
value_type: unknown
basis_of_value: unknown
replicates: unknown
methods: unknown
- var_in: leaf size (mm2)
unit_in: unknown
trait_name: unknown
entity_type: unknown
value_type: unknown
basis_of_value: unknown
replicates: unknown
methods: unknown
- var_in: LMA (mg mm-2)
unit_in: unknown
trait_name: unknown
entity_type: unknown
value_type: unknown
basis_of_value: unknown
replicates: unknown
methods: unknown
- var_in: Leaf nitrogen (mg mg-1)
unit_in: unknown
trait_name: unknown
entity_type: unknown
value_type: unknown
basis_of_value: unknown
replicates: unknown
methods: unknown
- var_in: leaf size (mm2)
unit_in: unknown
trait_name: unknown
entity_type: unknown
value_type: unknown
basis_of_value: unknown
replicates: unknown
methods: unknown
```

------------------------------------------------------------------------
Expand Down Expand Up @@ -401,15 +401,15 @@ If the units being read in for a specific trait differ from those defined for th

#### **Final steps**

##### **double check the metadata.yml file**
##### **Double check the metadata.yml file**

You should now have a completed `metadata.yml` file, with no `unknown` fields.

You'll notice five sections we haven't used, `contexts`, `substitutions`, `taxonomic_updates`, `exclude_observations`, and `questions`.

These should each contain an `.na` (as in `substitutions: .na`). They will be explored in future lessons.

##### **run tests on the metadata file**
##### **Run tests on the metadata file**

Confirm there are no errors in the `metadata.yml` file:

Expand All @@ -421,7 +421,7 @@ This *should* result in the following output:

[\[ FAIL 0 \| WARN 0 \| SKIP 0 \| PASS 79 \]]{style="color:blue;"}\

##### **add dataset to the database** {#build_pipline}
##### **Add dataset to the database** {#build_pipeline}

Next add the dataset_id to the build file that builds the database and rebuild the database

Expand All @@ -430,7 +430,7 @@ build_setup_pipeline(method = "base", database_name = "traits.build_database")
source("build.R")
```

##### **build dataset report**
##### **Build dataset report**

As a final step, build a report for the study

Expand Down
40 changes: 20 additions & 20 deletions tutorial_dataset_2.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Before you begin this tutorial, ensure you have installed traits.build, cloned t

- Understand the importance of having the [dataset pivot](#dataset_pivot).

### New Functions Introduced
### New functions introduced

- metadata_add_substitution

Expand Down Expand Up @@ -125,14 +125,14 @@ Then rename your columns to match those in use:
locations <-
locations %>%
rename(
`longitude (deg)` = long,
`latitude (deg)` = lat,
`description` = vegetation,
`elevation (m)` = elevation,
`precipitation, MAP (mm)` = MAP,
`soil P, total (mg/kg)` = `soil P`,
`soil N, total (ppm)` = `soil N`,
`geology (parent material)` = `parent material`
`longitude (deg)` = long,
`latitude (deg)` = lat,
`description` = vegetation,
`elevation (m)` = elevation,
`precipitation, MAP (mm)` = MAP,
`soil P, total (mg/kg)` = `soil P`,
`soil N, total (ppm)` = `soil N`,
`geology (parent material)` = `parent material`
)
```

Expand Down Expand Up @@ -407,7 +407,7 @@ custom_R_code: '
mutate(
across(c("TRAIT Leaf Dry Mass UNITS g"), ~na_if(.x,0))
) %>%
group_by(name_original) %>%
group_by(name_original) %>%
mutate(
across(c("TRAIT Growth Form CATEGORICAL EP epiphyte (mistletoe) F fern G grass H herb S shrub T tree V vine"), replace_duplicates_with_NA)
) %>%
Expand All @@ -427,23 +427,23 @@ Then rebuild the database and look at the output in the traits table for one of
source("build.R")
traits.build_database$traits %>%
filter(dataset_id == "tutorial_dataset_2") %>%
filter(dataset_id == "tutorial_dataset_2") %>%
filter(taxon_name == "Actinotus minor") %>% View()
dataset_id taxon_name observation_id trait_name value unit entity_type location_id
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 tutorial_dataset_2 Actinotus minor 010 leaf_area 18.8 mm2 population 02
2 tutorial_dataset_2 Actinotus minor 010 leaf_dry_mass 7 mg population 02
3 tutorial_dataset_2 Actinotus minor 010 leaf_mass_per_area 344.827586206897 g/m2 population 02
4 tutorial_dataset_2 Actinotus minor 011 leaf_area 75.9 mm2 population 03
5 tutorial_dataset_2 Actinotus minor 011 leaf_dry_mass 7 mg population 03
6 tutorial_dataset_2 Actinotus minor 011 leaf_mass_per_area 89.2857142857143 g/m2 population 03
7 tutorial_dataset_2 Actinotus minor 012 plant_growth_form herb NA species NA
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 tutorial_dataset_2 Actinotus minor 010 leaf_area 18.8 mm2 population 02
2 tutorial_dataset_2 Actinotus minor 010 leaf_dry_mass 7 mg population 02
3 tutorial_dataset_2 Actinotus minor 010 leaf_mass_per_area 344.827586206897 g/m2 population 02
4 tutorial_dataset_2 Actinotus minor 011 leaf_area 75.9 mm2 population 03
5 tutorial_dataset_2 Actinotus minor 011 leaf_dry_mass 7 mg population 03
6 tutorial_dataset_2 Actinotus minor 011 leaf_mass_per_area 89.2857142857143 g/m2 population 03
7 tutorial_dataset_2 Actinotus minor 012 plant_growth_form herb NA species NA
```

The measurements for the three numeric traits from a single location share a common `observation_id`, as they are all part of an observation of a common entity (a specific population of *Actinotus minor*), at a single location, at a single point in time. However the row with the plant growth form measurement has a separate `observation_id` reflecting that this is an observation of a different entity (the taxon *Actinotus minor*).

##### **build dataset report**
##### **Build dataset report**

As a final step, build a report for the study

Expand Down
Loading

0 comments on commit 7b5518c

Please sign in to comment.