Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pds-deep-registry-archive produces incomplete SIPs for data recently uploaded to the Registry #202

Open
tbarnes4 opened this issue Mar 5, 2025 · 3 comments
Assignees

Comments

@tbarnes4
Copy link

tbarnes4 commented Mar 5, 2025

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

Three, perhaps related issues. I only get these issues with bundles that were first registered in the MCP registry, not those migrated from the JPL registry.

(1) The checksum manifest, transfer manifest, and SIP tables produced only contain the bundle product, i.e. bundle.xml and readme.txt.

(2) In some instances, the checksum and transfer manifest table files state "." or "/." as the filename instead of "bundle.xml" or "/bundle.xml".

(3) I can run pds-deep-registry-archive on collections without the utility erroring out. I get similar results as (1) and (2).

Including @plawton-umd who registered the data in case there is a problem with how data is being registered in new MCP registry.

🕵️ Expected behavior

(1) I expected full manifest files that include all products for the bundle recursively.

(2) I expected proper filenames and paths.

(3) I expected this to fail, since the tool claims it should only be run on bundles. Though I do like the option to run on collections and/or only the bundle product.

📜 To Reproduce

For an example for (1), run the command:

pds-deep-registry-archive -s PDS_SBN urn:nasa:pds:nh_swap::1.0

  • nh_swap_v1.0_20250305_checksum_manifest_v1.0.tab

ea217ff6dbc8ef849cdad32d92767377 bundle.lblx
9941da1263bb1d4de128d1e7b57b6c58 readme.txt

  • nh_swap_v1.0_20250305_sip_v1.0.tab

ea217ff6dbc8ef849cdad32d92767377 MD5 https://pdssbn.astro.umd.edu/holdings/pds4-nh_swap-v1.0/bundle.lblx urn:nasa:pds:nh_swap::1.0
9941da1263bb1d4de128d1e7b57b6c58 MD5 https://pdssbn.astro.umd.edu/holdings/pds4-nh_swap-v1.0/readme.txt urn:nasa:pds:nh_swap::1.0

  • nh_swap_v1.0_20250305_transfer_manifest_v1.0.tab

urn:nasa:pds:nh_swap::1.0 /bundle.lblx
urn:nasa:pds:nh_swap::1.0 /readme.txt

Other examples: urn:nasa:pds:nh_lorri::1.0, urn:nasa:pds:nh_sdc::1.0

I get proper results if I run: pds-deep-registry-archive -s PDS_SBN urn:nasa:pds:epoxi_mri::1.0

For an example for (2), run the command:

pds-deep-registry-archive -s PDS_SBN urn:nasa:pds:gbl-classe::1.0

  • gbl-classe_v1.0_20250304_checksum_manifest_v1.0.tab

830e458398de12ebe3918b4f0dbb1130 .

  • gbl-classe_v1.0_20250304_sip_v1.0.tab

830e458398de12ebe3918b4f0dbb1130 MD5 https://pdssbn.astro.umd.edu/holdings/pds4-gbl-classe-v1.0/bundle.xml urn:nasa:pds:gbl-classe::1.0

  • gbl-classe_v1.0_20250304_transfer_manifest_v1.0.tab

urn:nasa:pds:gbl-classe::1.0 /.

Other example: urn:nasa:pds:nh_documents::4.0

For an example for (3), run the command:

pds-deep-registry-archive -s PDS_SBN urn:nasa:pds:nh_sdc:kem1_cal::1.0

  • kem1_cal_v1.0_20250304_checksum_manifest_v1.0.tab

ddcc8ac5cfbebe0ef972c4e1e63e26be collection.lblx
cb8fecf0a29cde3eead887ae59228a5d inventory.csv

  • kem1_cal_v1.0_20250304_sip_v1.0.tab

ddcc8ac5cfbebe0ef972c4e1e63e26be MD5 https://pdssbn.astro.umd.edu/holdings/pds4-nh_sdc:kem1_cal-v1.0/collection.lblx urn:nasa:pds:nh_sdc:kem1_cal::1.0
cb8fecf0a29cde3eead887ae59228a5d MD5 https://pdssbn.astro.umd.edu/holdings/pds4-nh_sdc:kem1_cal-v1.0/inventory.csv urn:nasa:pds:nh_sdc:kem1_cal::1.0

  • kem1_cal_v1.0_20250304_transfer_manifest_v1.0.tab

urn:nasa:pds:nh_sdc:kem1_cal::1.0 /collection.lblx
urn:nasa:pds:nh_sdc:kem1_cal::1.0 /inventory.csv

I get proper results if I run: pds-deep-registry-archive -s PDS_SBN urn:nasa:pds:epoxi_mri:hartley2_photometry::1.0

INFO 👟 PDS Deep Registry-based Archive, version 1.3.0
ERROR 💥 We got an unexpected error; sorry it didn't work out
Traceback (most recent call last):
File "/Users/tbarnes4/.virtualenvs/pds-deep-archive/lib/python3.10/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/local/Cellar/[email protected]/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/[email protected]/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/tbarnes4/.virtualenvs/pds-deep-archive/lib/python3.10/site-packages/pds2/aipgen/registry.py", line 407, in main
generatedeeparchive(args.url, args.bundle, args.site, not args.include_latest_collection_only)
File "/Users/tbarnes4/.virtualenvs/pds-deep-archive/lib/python3.10/site-packages/pds2/aipgen/registry.py", line 381, in generatedeeparchive
bac, title = _comprehendregistry(url, bundlelidvid, allcollections)
File "/Users/tbarnes4/.virtualenvs/pds-deep-archive/lib/python3.10/site-packages/pds2/aipgen/registry.py", line 235, in _comprehendregistry
for product in _getproducts(url, collection["id"]):
File "/Users/tbarnes4/.virtualenvs/pds-deep-archive/lib/python3.10/site-packages/pds2/aipgen/registry.py", line 180, in _getproducts
matches = r.json()["data"]
File "/Users/tbarnes4/.virtualenvs/pds-deep-archive/lib/python3.10/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)
INFO 👋 Thanks for using this program! Bye!

...

🖥 Environment Info

  • Operating System: mac OS 12.7.6
  • Operating System: Red Hat Enterprise Linux 9.5

📚 Version of Software Used

pds-deep-registry-archive 1.3.0

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

Acceptance Criteria

Given
When I perform
Then I expect

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

@tbarnes4 tbarnes4 added bug Something isn't working needs:triage labels Mar 5, 2025
@jordanpadams jordanpadams changed the title pds-deep-registry-archive produces incomplete files for recently uploaded bundles on MCP pds-deep-registry-archive produces incomplete SIPs for data recently uploaded to the Registry Mar 5, 2025
@jordanpadams
Copy link
Member

@tbarnes4 thanks for reporting this. it appears that one of our automated services is down that updates to registry in order to add metadata needed by deep-archive. this somehow got lost in the shuffle, and has now been moved to top priority for our team. apologies for the inconvenience. We will manually kickoff the processing so that this gets fixed ASAP. I am out on leave for a week, but @tloubrieu-jpl @sjoshi-jpl @alexdunnjpl will let you know as soon as this is fixed and you can try re-running

@jordanpadams
Copy link
Member

Blocked by NASA-PDS/registry-sweepers#147

@jordanpadams
Copy link
Member

Skipping I&t because this will be tested as part of NASA-PDS/registry-sweepers#147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants