Skip to content

Ensure that we're able to keep in sync the repository's metadata, the metadata in the "Datacite" export, and the metadata we expect DataCite to have #409

@jggautier

Description

@jggautier

Overview

This GitHub issue tracks progress towards the goal of making sure that we're able to keep in sync the metadata in Harvard Dataverse; the metadata included in each dataset's and file's "Datacite" export; and the metadata that we expect DataCite to have, now and when we make changes to the metadata we need to send to DataCite.

Ensuring this will be more important as we make changes to Dataverse that help users describe their data (e.g. IQSS/dataverse-pm#127), and as other folks, like NIH GREI groups, progress plans to evaluate those changes by looking at the metadata that DataCite has in order to measure the quality of the repository's dataset metadata (e.g. IQSS/dataverse-pm#420, IQSS/dataverse-pm#411, IQSS/dataverse-pm#419).

Background

This GitHub issue replaces the narrower GitHub issue at #146, which has conversations and examples about how those three sources of metadata are not in sync, and conversations about potential solutions for syncing those three sources.

This GitHub issue is a followup to IQSS/dataverse#5144, which was closed because we include in the release notes of each Dataverse software version instructions for sending updated metadata to DataCite. We've found that for the folks running several Dataverse installations, this wasn't enough.

For Harvard Dataverse in particular, keeping these metadata sources in sync has been too resource intensive. But work that has been done over the years may make it easier or less resource intensive to do this reliably.

Participants

Tasks

  • @jggautier to followup with Leonid to ask who could do this work when it's prioritized
  • Look for cases where DataCite doesn't have the metadata we expect and cases where the "Datacite" export doesn't have the metadata we expect.
  • Investigate how to sync the current metadata in Harvard Dataverse; the metadata included in each dataset's and file's "Datacite" export; and the metadata that we expect DataCite to have. And investigate how to ensure that these sources are in sync for already-published datasets and files when we make changes to the metadata that the repository sends to DataCite. This investigation might lead to additional tasks.
  • Based on the investigation, sync the current metadata in Harvard Dataverse; the metadata included in each dataset's and file's "Datacite" export; and the metadata that we expect DataCite to have. And demonstrate that the metadata in the three sources is synced.

Related

Timeline

  • Start: October 2025
  • End: To be determined; sometime before the end of FY 2026 (June 30, 2026)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions