Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add identifiers table to traits.build output #195

Merged
merged 19 commits into from
Jan 29, 2025
Merged

Conversation

ehwenk
Copy link
Collaborator

@ehwenk ehwenk commented Jan 28, 2025

Add a dataframe to a traits.build output that documents "identifiers" that link specimens/individuals/etc. across datasets and to museum/herbarium collections.

Six types of identifiers have been added to the traits.build schema, reflecting key identifiers types specified by DarwinCore. This is not a comprehensive list of all DarwinCore identifiers - but instead of adding others, this seems like a good starting point, and traits.build has been coded to NOT throw an error is a different identifier_type is used.

Note, catalog_number is for non-universal identifiers that represent an individual researchers plant tag system while collectionID and materialSampleID are meant to be universally unique identifiers.

Notes:

  • coded so that metadata.yml files are not required to have an identifiers section, so people already using traits.build do not need to retrofit their metadata files
  • there is a corresponding branch in {austraits} that makes changes to functions to accommodate the changed output structure. The DESCRIPTION specifies that {austraits} must be installed from this commit - and will need to be changed back to master when new releases of {austraits} and {traits.build} are simultaneously released.
  • pull request includes: tests, new function to add identifiers to metadata files

I don't understand why changes that were previously pushed to master are appearing here. For instance, the renaming of functions. Confirming this pull request is from add_specimen_ID branch to develop

* Code to add a table of identifiers is working - tested so far with a single column of identifiers, but workflow should work for any number of identifiers

* Next steps:
- test with multiple identifier columns
- rework `metadata_create_template` to add identifiers
- create function `metadata_add_identifiers` to propagate identifier section of metadata.yml
- write tests
- edits to lots of austraits functions since now list is 1 longer; need to create function to join in identifiers
Add identifiers to remaining part of schema
Add new function for adding identifiers to metadata files
To avoid people having to retrofit existing metadata files to include identifiers, if identifiers is missing from the metadata file, a blank metadata tibble is added during processing.
3 old tests not passing, but some might be because running new austraits, but accompanying traits.build changes not on this branch.
All old tests now passing with new `identifiers` table.

Next need to add tests and identifiers permutations to test datasets.
- For test datasets, add fake identifiers to one dataset (Test_2023_1)
- For remaining datasets, have different metadata configurations, some with identifiers included vs excluded
further edits to schema to capture most likely identifier types used.

At least for a while, these will be a suggested list, rather than a controlled vocabulary - as in an identifier types can be used, but only these will be suggested by the function `metadata_add_identifiers`.
Added another test, confirming datasets build with "non-controlled identifiers" - good to include, since we don't yet fully know what people want to have

Few other formatting changes, update documentation
- identifiers table contained rows for which there wasn't a trait value (blanks filtered from traits table after  identifiers table detached as its own table)
@ehwenk ehwenk changed the title Add specimen Add identifiers table to traits.build output Jan 28, 2025
@ehwenk ehwenk requested a review from dfalster January 28, 2025 00:17
Specify austraits branch that goes with this traits.build branch
Removing automatically added zeros
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
- add mistakenly reverted some file renamings
- add `austraits::` in report dataset
- rework use of `identifiers_tmp` and `identifiers` in process.R
- add dataset_test tests for identifiers
NAMESPACE Outdated Show resolved Hide resolved
@ehwenk ehwenk merged commit 27a6517 into develop Jan 29, 2025
3 checks passed
@ehwenk ehwenk deleted the add_specimen_ID branch January 29, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants