Add GitHub chapter (#17)

* add github chapter * add versioned_releases chapter --------- Co-authored-by: ehwenk <[email protected]>
traitecoevo · Dec 14, 2023 · 7b5518c · 7b5518c
1 parent 51b8de8
commit 7b5518c
Show file tree

Hide file tree

Showing 13 changed files with 226 additions and 124 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -4,7 +4,7 @@ project:
 book:
   title: "The {traits.build} data standard, R package, and workflow"
   author: ["Elizabeth Wenk", "Daniel Falster", "Sophie Yang", "Fonti Kar"]
-  page-footer: "How to get help: <https://traitecoevo.github.io/traits.build-book/help.html><br>Copyright 2023, Daniel Falster and Elizabeth Wenk" 
+  page-footer: "How to get help: <https://traitecoevo.github.io/traits.build-book/help.html><br>Copyright 2023, Daniel Falster and Elizabeth Wenk"
   page-navigation: true
   chapters:
     - part: Introduction
@@ -13,7 +13,7 @@ book:
         - motivation.qmd
         - workflow.qmd
         - usage_examples.qmd
-    - part: Data structure and standard 
+    - part: Data structure and standard
       chapters:
         - long_wide.qmd
         - database_standard.qmd
@@ -41,6 +41,8 @@ book:
         - adding_data_long.qmd
         - check_dataset_functions.qmd
         - data_common_issues.qmd
+        - github.qmd
+        - versioned_releases.qmd
     - part: Using outputs of `traits.build`
       chapters:
         - austraits_database.qmd
@@ -56,7 +58,7 @@ book:
   appendices:
     - csv.qmd
     - yaml.qmd
-  
+
 format:
   html:
     theme: cosmo

diff --git a/check_dataset_functions.qmd b/check_dataset_functions.qmd
@@ -161,7 +161,7 @@ dataset_check_outlier_by_species <- function(database, dataset, trait, multiplie
       comparisons) %>%
     dplyr::filter(as.numeric(value) > multiplier*mean_value | as.numeric(value) < (1/multiplier)*mean_value) %>%
     dplyr::mutate(value_ratio = as.numeric(value)/mean_value) %>%
-    dplyr::arrange(value_ratio)
+    dplyr::arrange(dplyr::desc(value_ratio))
 
   need_review
 
@@ -213,7 +213,7 @@ dataset_check_outlier_by_genus <- function(database, dataset, trait, multiplier)
       comparisons) %>%
     dplyr::filter(as.numeric(value) > multiplier*mean_value | as.numeric(value) < (1/multiplier)*mean_value) %>%
     dplyr::mutate(value_ratio = as.numeric(value)/mean_value) %>%
-    dplyr::arrange(value_ratio)
+    dplyr::arrange(dplyr::desc(value_ratio))
 
   need_review
 

diff --git a/database_structure.qmd b/database_structure.qmd
@@ -241,7 +241,7 @@ elements <- schema$austraits$elements$excluded_data
   writeLines(c(""))
 ```
 
-## taxa
+## Taxa
 
 **Description:** A table containing details on taxa that are included in the table [`traits`](#traits). We have attempted to align species names with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI); the sourced information is released under a CC-BY3 license.
 
@@ -256,7 +256,7 @@ elements <- schema$austraits$elements$taxa
   writeLines(c(""))
 ```
 
-## taxonomic_updates
+## Taxonomic_updates
 
 ```{r}
 elements <- schema$austraits$elements$taxonomic_updates
@@ -273,7 +273,7 @@ elements <- schema$austraits$elements$taxonomic_updates
 
 Both the original and the updated taxon names are included in the [`traits`](#traits) table.
 
-## definitions
+## Definitions
 
 ```{r}
 elements <- schema$austraits$elements$definitions
@@ -300,7 +300,7 @@ for (trait in c("leaf_mass_per_area", "woodiness")) {
 }
 ```
 
-## contributors
+## Contributors
 
 ```{r}
 elements <- schema$austraits$elements$contributors
@@ -315,7 +315,7 @@ elements <- schema$austraits$elements$contributors
   writeLines(c(""))
 ```
 
-## sources
+## Sources
 
 For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is defined as, `r austraits$schema$metadata$elements$source$values$primary$description` The secondary citation is defined as, `r austraits$schema$metadata$elements$source$values$secondary$description`
 
@@ -334,7 +334,7 @@ austraits$sources["Falster_2005_1"]
 
 A formatted version of the sources also exists within the table [methods](#methods).
 
-## metadata
+## Metadata
 
 ```{r}
 elements <- schema$austraits$elements$metadata
@@ -346,7 +346,7 @@ elements <- schema$austraits$elements$metadata
   writeLines(c(""))
 ```
 
-## build_info
+## Build_info
 
 ```{r}
 elements <- schema$austraits$elements$build_info

diff --git a/github.qmd b/github.qmd
@@ -0,0 +1,61 @@
+# Using GitHub
+
+## Working with your GitHub repository
+
+For {traits.build} users, the preferred way of hosting your database is on GitHub.
+
+### Setting up the repository
+
+There are some GitHub settings we recommend:
+- `General`: Enable "Always suggest updating pull request branches" to keep the branch up to date with the main branch before merging
+- `General`: Enable "Automatically delete head branches" to delete the branch after merging, which keeps your branches clean
+- `Branches`: Add a branch protection rule for your main or develop branch and enable "Require a pull request before merging", "Require conversation resolution before merging", "Require deployments to succeed before merging"
+
+#### Automated tests during pull requests
+
+To run automated tests that must pass before a pull request can be merged in, you can set up GitHub workflows via the Actions tab on GitHub. The setting "Require deployments to succeed before merging" must be enabled for the `main` or `develop` branch. You can write your own workflows which are stored in `.github/workflows/`. For {austraits.build}, the GitHub workflow runs `dataset_test` on all data sources and compiles the database (see [here](https://github.com/traitecoevo/austraits.build/blob/51964dbe4d302c6dade51db133e9e32514cddaae/.github/workflows/check-build.yml)).
+
+
+### Adding to the repository
+
+New data can be added to the repository by creating a branch and then opening a [pull request](https://help.github.com/articles/using-pull-requests/) (PR). Those who want to contribute but aren't approved maintainers of the database, must fork and clone the database from GitHub.
+
+In short,
+
+1. Create a Git branch for your new work, either within the repo (if you are an approved contributor) or as a [fork of the repo](https://help.github.com/en/github/getting-started-with-github/fork-a-repo).
+2. Make commits and push these up onto the branch.
+3. Make sure everything runs fine before you send a PR (see [tutorials for adding datasets](tutorial_datasets.html)).
+4. Submit the PR and tag someone as a reviewer.
+5. Squash and merge the PR once approved and any changes have been made.
+
+**Tip**: For working with git and GitHub, we recommend GitHub Desktop, a user-friendly graphical interface tool.
+
+#### Merging a pull request
+
+The easiest way to merge a PR is to use GitHub's built-in options for squashing and merging. This leads to:
+
+- A single commit
+- The work is attributed to the original author
+
+You can merge in a PR after it has been approved. To merge a PR, you need to be an approved maintainer. You do not need to be the original author of the PR (the commit will still be by the original author).
+
+1. Send the PR.
+2. Tag someone to review.
+3. If there are any updates to the main branch, merge those into your new branch and resolve any conflicts.
+4. Once ready, merge into the main branch, choosing "Squash & Merge", using an informative commit message. "Squash" merges all your commits on the branch into one.
+
+##### Commit messages
+
+Informative commit messages are ideal. They should clearly describe the work done and value added to the database in a few, clear, bulleted points. If relevant, they should reference any GitHub issues. You can [link to and directly close GitHub issues via the commit message](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword). To link to another commit you can also use the SHA-hash or its 7-character prefix.
+
+An example commit message:
+
+```
+Smith_1996: Add study
+- For #224, closes #286
+- Trait data for Nothofagus forests across Australia, New Zealand and South America
+```
+
+## Bugs and feature requests for {traits.build}
+
+If you find a bug or have a feature request for {traits.build}, [file a GitHub issue](https://github.com/traitecoevo/traits.build/issues) on {traits.build}. Illustrate the bug with a minimal [reprex](https://www.tidyverse.org/help/#reprex) (reproducible example). Please feel free to contribute by implementing the fix or feature via pull request. For substantial pull requests, it is best to first check with the {traits.build} team that it's worth pursuing the problem.
diff --git a/traits_build.qmd b/traits_build.qmd
@@ -8,7 +8,7 @@ The core components of the `{traits.build}` package are:
 
 1.  15 functions functions, supplemented by a detailed [protocol](tutorial_datasets.html) to wrangle diverse datasets into input files with a common structure that captures both the trait data and all essential metadata and context properties. These are a table (data.csv) containing all trait data, taxon names, location names (if relevant), and any context properties (if relevant) and a structured metadata file (metadata.yml) that assigns the columns from the `data.csv` file to their specific variables and maps all additional dataset metadata in a structured format. 
 
-2.  An R-based pipeline to combine the input files into a single harmonised database with aligned trait names, aligned units, aligned categorical trait values, and aligned taxon names. Four dataset-specific configuration files are required for the build process, 1) a trait dictionary; 2) a units conversion file; 3) a taxon list; and 4) a database metadata file. 
+2.  An R-based pipeline to combine the input files into a single harmonised database with aligned trait names, aligned units, aligned categorical trait values, and aligned taxon names. Four database-specific configuration files are required for the build process, 1) a trait dictionary; 2) a units conversion file; 3) a taxon list; and 4) a database metadata file. 
 
 Guided by the information in the configuration files, the R-scripted workflow combines the `data.csv` and `metadata.yml` files for the individual datasets into a unified, harmonised database. There are three distinct steps to this process, processed by a trio of functions, `dataset_configure`, `dataset_process`, and `dataset_taxonomic_updates`. These functions cyclically build each dataset, only combining them into a single database at the end of the workflow.
 

diff --git a/tutorial_dataset_1.qmd b/tutorial_dataset_1.qmd
@@ -12,7 +12,7 @@ Before you begin this tutorial, ensure you have installed traits.build, cloned t
 
 -   Learn how to [merge a new dataset](#build_pipline) into a `traits.build` database.
 
-### New Functions Introduced
+### New functions introduced
 
 -   metadata_create_template
 
@@ -40,7 +40,7 @@ In the traits.build-template repository, there is a folder titled `tutorial_data
 
 -   There is a folder `raw` nested within the `tutorial_dataset_1` folder, that contains one file, `notes.txt`. 
 
-### source necessary functions
+### Source necessary functions
 
 -   Source the functions in the `traits.build` package:
 
@@ -140,7 +140,7 @@ A follow-up question then allows you to add a fixed `collection_date` as a range
 
 [Enter collection_date range in format '2007/2009':]{style="color:blue;"} [**2002-11/2002-11**]{style="color:red;"}\
 
-A final user prompt asks if, for any traits, a sequence of rows represents repeat observations.\ 
+A final user prompt asks if, for any traits, a sequence of rows represents repeat observations.\
 
 [Do all traits need `repeat_measurements_id`'s?]{style="color:blue;"}
 
@@ -168,7 +168,7 @@ metadata_add_source_doi(dataset_id = "tutorial_dataset_1", doi = "10.1111/j.0022
 The following information is automatically propagated into the source field:
 
 ```{r, eval=FALSE}
-primary: 
+primary:
   key: Test_1
   bibtype: Article
   year: '2005'
@@ -286,30 +286,30 @@ You select columns 3, 4, 5, as these contain trait data.
 
 ```{r, eval=FALSE}
 traits:
-- var_in: LMA (mg mm-2)  
-  unit_in: unknown  
-  trait_name: unknown  
-  entity_type: unknown  
-  value_type: unknown  
-  basis_of_value: unknown  
-  replicates: unknown  
-  methods: unknown  
-- var_in: Leaf nitrogen (mg mg-1)  
-  unit_in: unknown  
-  trait_name: unknown  
-  entity_type: unknown  
-  value_type: unknown  
-  basis_of_value: unknown  
-  replicates: unknown  
-  methods: unknown  
-- var_in: leaf size (mm2)  
-  unit_in: unknown  
-  trait_name: unknown  
-  entity_type: unknown  
-  value_type: unknown  
-  basis_of_value: unknown  
-  replicates: unknown  
-  methods: unknown  
+- var_in: LMA (mg mm-2)
+  unit_in: unknown
+  trait_name: unknown
+  entity_type: unknown
+  value_type: unknown
+  basis_of_value: unknown
+  replicates: unknown
+  methods: unknown
+- var_in: Leaf nitrogen (mg mg-1)
+  unit_in: unknown
+  trait_name: unknown
+  entity_type: unknown
+  value_type: unknown
+  basis_of_value: unknown
+  replicates: unknown
+  methods: unknown
+- var_in: leaf size (mm2)
+  unit_in: unknown
+  trait_name: unknown
+  entity_type: unknown
+  value_type: unknown
+  basis_of_value: unknown
+  replicates: unknown
+  methods: unknown
 ```
 
 ------------------------------------------------------------------------
@@ -401,15 +401,15 @@ If the units being read in for a specific trait differ from those defined for th
 
 #### **Final steps**
 
-##### **double check the metadata.yml file**
+##### **Double check the metadata.yml file**
 
 You should now have a completed `metadata.yml` file, with no `unknown` fields.
 
 You'll notice five sections we haven't used, `contexts`, `substitutions`, `taxonomic_updates`, `exclude_observations`, and `questions`.
 
 These should each contain an `.na` (as in `substitutions: .na`). They will be explored in future lessons.
 
-##### **run tests on the metadata file**
+##### **Run tests on the metadata file**
 
 Confirm there are no errors in the `metadata.yml` file:
 
@@ -421,7 +421,7 @@ This *should* result in the following output:
 
 [\[ FAIL 0 \| WARN 0 \| SKIP 0 \| PASS 79 \]]{style="color:blue;"}\
 
-##### **add dataset to the database** {#build_pipline}
+##### **Add dataset to the database** {#build_pipeline}
 
 Next add the dataset_id to the build file that builds the database and rebuild the database
 
@@ -430,7 +430,7 @@ build_setup_pipeline(method = "base", database_name = "traits.build_database")
 source("build.R")
 ```
 
-##### **build dataset report**
+##### **Build dataset report**
 
 As a final step, build a report for the study
 

diff --git a/tutorial_dataset_2.qmd b/tutorial_dataset_2.qmd
@@ -18,7 +18,7 @@ Before you begin this tutorial, ensure you have installed traits.build, cloned t
 
 -   Understand the importance of having the [dataset pivot](#dataset_pivot).
 
-### New Functions Introduced
+### New functions introduced
 
 -   metadata_add_substitution
 
@@ -125,14 +125,14 @@ Then rename your columns to match those in use:
 locations <-
   locations %>%
     rename(
-      `longitude (deg)` = long,  
-      `latitude (deg)` = lat,  
-      `description` = vegetation,  
-      `elevation (m)` = elevation,  
-      `precipitation, MAP (mm)` = MAP,  
-      `soil P, total (mg/kg)` = `soil P`,  
-      `soil N, total (ppm)` = `soil N`,  
-      `geology (parent material)` = `parent material`  
+      `longitude (deg)` = long,
+      `latitude (deg)` = lat,
+      `description` = vegetation,
+      `elevation (m)` = elevation,
+      `precipitation, MAP (mm)` = MAP,
+      `soil P, total (mg/kg)` = `soil P`,
+      `soil N, total (ppm)` = `soil N`,
+      `geology (parent material)` = `parent material`
     )
 ```
 
@@ -407,7 +407,7 @@ custom_R_code: '
     mutate(
       across(c("TRAIT Leaf Dry Mass UNITS g"), ~na_if(.x,0))
     ) %>%
-    group_by(name_original) %>% 
+    group_by(name_original) %>%
     mutate(
       across(c("TRAIT Growth Form CATEGORICAL EP epiphyte (mistletoe) F fern G grass H herb S shrub T tree V vine"), replace_duplicates_with_NA)
     ) %>%
@@ -427,23 +427,23 @@ Then rebuild the database and look at the output in the traits table for one of
 source("build.R")
 
 traits.build_database$traits %>%
-  filter(dataset_id == "tutorial_dataset_2") %>% 
+  filter(dataset_id == "tutorial_dataset_2") %>%
   filter(taxon_name == "Actinotus minor") %>% View()
 
   dataset_id     taxon_name      observation_id trait_name         value            unit  entity_type location_id
-  <chr>          <chr>           <chr>          <chr>              <chr>            <chr> <chr>       <chr>      
-1 tutorial_dataset_2 Actinotus minor 010            leaf_area          18.8             mm2   population  02         
-2 tutorial_dataset_2 Actinotus minor 010            leaf_dry_mass      7                mg    population  02         
-3 tutorial_dataset_2 Actinotus minor 010            leaf_mass_per_area 344.827586206897 g/m2  population  02         
-4 tutorial_dataset_2 Actinotus minor 011            leaf_area          75.9             mm2   population  03         
-5 tutorial_dataset_2 Actinotus minor 011            leaf_dry_mass      7                mg    population  03         
-6 tutorial_dataset_2 Actinotus minor 011            leaf_mass_per_area 89.2857142857143 g/m2  population  03         
-7 tutorial_dataset_2 Actinotus minor 012            plant_growth_form  herb             NA    species     NA      
+  <chr>          <chr>           <chr>          <chr>              <chr>            <chr> <chr>       <chr>
+1 tutorial_dataset_2 Actinotus minor 010            leaf_area          18.8             mm2   population  02
+2 tutorial_dataset_2 Actinotus minor 010            leaf_dry_mass      7                mg    population  02
+3 tutorial_dataset_2 Actinotus minor 010            leaf_mass_per_area 344.827586206897 g/m2  population  02
+4 tutorial_dataset_2 Actinotus minor 011            leaf_area          75.9             mm2   population  03
+5 tutorial_dataset_2 Actinotus minor 011            leaf_dry_mass      7                mg    population  03
+6 tutorial_dataset_2 Actinotus minor 011            leaf_mass_per_area 89.2857142857143 g/m2  population  03
+7 tutorial_dataset_2 Actinotus minor 012            plant_growth_form  herb             NA    species     NA
 ```
 
 The measurements for the three numeric traits from a single location share a common `observation_id`, as they are all part of an observation of a common entity (a specific population of *Actinotus minor*), at a single location, at a single point in time. However the row with the plant growth form measurement has a separate `observation_id` reflecting that this is an observation of a different entity (the taxon *Actinotus minor*).
 
-##### **build dataset report**
+##### **Build dataset report**
 
 As a final step, build a report for the study