-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Overview
This GitHub issue tracks progress towards the goal of making sure that we're able to keep in sync the metadata in Harvard Dataverse; the metadata included in each dataset's and file's "Datacite" export; and the metadata that we expect DataCite to have, now and when we make changes to the metadata we need to send to DataCite.
Ensuring this will be more important as we make changes to Dataverse that help users describe their data (e.g. IQSS/dataverse-pm#127), and as other folks, like NIH GREI groups, progress plans to evaluate those changes by looking at the metadata that DataCite has in order to measure the quality of the repository's dataset metadata (e.g. IQSS/dataverse-pm#420, IQSS/dataverse-pm#411, IQSS/dataverse-pm#419).
Background
This GitHub issue replaces the narrower GitHub issue at #146, which has conversations and examples about how those three sources of metadata are not in sync, and conversations about potential solutions for syncing those three sources.
This GitHub issue is a followup to IQSS/dataverse#5144, which was closed because we include in the release notes of each Dataverse software version instructions for sending updated metadata to DataCite. We've found that for the folks running several Dataverse installations, this wasn't enough.
For Harvard Dataverse in particular, keeping these metadata sources in sync has been too resource intensive. But work that has been done over the years may make it easier or less resource intensive to do this reliably.
Participants
- Leads: @jggautier and either @landreev
Tasks
- @jggautier to followup with Leonid to ask who could do this work when it's prioritized
- Look for cases where DataCite doesn't have the metadata we expect and cases where the "Datacite" export doesn't have the metadata we expect.
- Try the database query ideas shared later in this GitHub issue. See Google Sheet with database query, list of datasets it produces, and notes about missing metadata and broken DOIs.
- Review known analyses of Harvard Dataverse metadata that DataCite has, such as Ted Haberman's Google Collab notebook done as part of the NIH GREI's Metadata Game Changers task group
- Investigate how to sync the current metadata in Harvard Dataverse; the metadata included in each dataset's and file's "Datacite" export; and the metadata that we expect DataCite to have. And investigate how to ensure that these sources are in sync for already-published datasets and files when we make changes to the metadata that the repository sends to DataCite. This investigation might lead to additional tasks.
- Based on the investigation, sync the current metadata in Harvard Dataverse; the metadata included in each dataset's and file's "Datacite" export; and the metadata that we expect DataCite to have. And demonstrate that the metadata in the three sources is synced.
Related
- Resend metadata to PID providers when metadata schema used to register PIDs is modified dataverse#5144
- Affiliations entered in affiliation fields are parenthesized in "Datacite" and Schema.org exports dataverse#9330
- QDR-DataCite Scaling dataverse#11832
- Send to DataCite the relationType metadata of files that have PIDs #146
Timeline
- Start: October 2025
- End: To be determined; sometime before the end of FY 2026 (June 30, 2026)
Metadata
Metadata
Labels
Type
Projects
Status