Skip to content

Version Exported Bags #5

@ThomasThelen

Description

@ThomasThelen

Background

With PR #4, we no longer need to have a pid mapping file for packages whose resource maps encode the file path. If this change is made, we should consider making any additional changes to the package export format now.

Proposals

  1. Depreciate the pid-mapping.txt file in the exported bags.
    This can be done for packages with file paths in the resource map however, for older packages the pid-mapping.txt file is probably still required.

  2. Relocate oai-ore.txt inside data/
    By relocating the file in the data/ directory, we no longer have to declare it as a tag file (which doesn't actually break the bagit spec). Other system that may ingest our bags won't have to worry about parsing the additional tag file if we do this.

I've outlined two possible formats for a V2 export format. I'm leaning towards the second suggestion because it has a more clear distinction as to which files are relevant to the data package.

Consider a package named Frog Counts that is exported in the proposed V2 format.

Option 1

  1. The root directory of the package is not placed in data/ (see Option 2)
  2. The ORE is at the data/ root
<base directory>/
├── bagit.txt
├── bag-info.txt
├── manifest-<algorithm>.txt
└── data
    ├── oai-ore.txt
    ├── data-file-1.csv
    ├── data-file-2.csv
    ├── data-file-3.hdf
    └── metadata-file-1.xml

Option 2

  1. The data package is placed in a folder within data/
  2. The ORE document is placed within data/
<base directory>/
├── bagit.txt
├── bag-info.txt
├── manifest-<algorithm>.txt
└── data/
    ├── oai-ore.txt
    └── Frog Counts/
        ├── data-file-1.csv
        ├── data-file-2.csv
        ├── data-file-3.hdf
        └── metadata-file-1.xml

Scope of Changes

Changes will have to made to software project in the DataONE ecosystem that handle exporting and importing. These include

  1. GMN
  2. Metacat

I'd like to gather questions, comments, and concerns in this issue. Feel free to reply below.

People interested in this probably include @mbjones, @datadavev, @taojing2002, @amoeba, @csjx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions