You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create new pds-deep-archive program and improve performance (#26)
* Resolutions for #13 and #21
- Resolve#21 with a new driver program `aipsip` that generates both the AIP and uses it to make the SIP as well, leaving all in the current working directory (along with two—count 'em, *two*—PDS labels for the price of one!).
- Updates the Python `setuptools` metadata to generate the new `aipsip` (helps with #21).
- Refactors logging and command-line argument setup (also for #21).
- Unifies logging between `aipgen` and `sipgen` with the new `aipsip` so that there are `--debug` and `--quiet` options; without either you get a nominal amount of "hand-holding" of output.
- Resolve#13 so that instead of billions of redundant XML parsing and XPath lookups we use a local `sqlite3` database and LRU caching.
- Factor out XML parsing from `aipgen` and `sipgen` so we can apply caching.
- Clear up logging messages so we can know what's calling what.
- Create a temp DB in `sipgen` and populate it with mappings from lidvids to XML files for rapid lookups
- But see also #25 for other uses of that DB.
- Add standardized `--version` arguments for all three programs.
With these changes, running `sipgen` on my Mac¹ can process a 272GiB `insight_cameras` export in 1:03. On `pdsimg-int1`, it handles the 1.5TiB`insight_cameras` dataset in under 4 hours.
Footnotes:
- ¹2.4 GHz 8-core Intel Core i9, SSD
- ²2.3 GHz 8-core Intel Xeon Gold 6140, unknown drive
* Improvements for usability and bug fixes for validate errors
* After running validate, there were a few minor fixes that needed to be implemented.
* Commented out / removed several CLI options for the time being until functionality is fully developed.
* Updated file naming to take into the account bundle versioning separate from the AIP/SIP version
* Updated docs per new pds-deep-archive script which combines aipgen and sipgen.
Refs #21
Co-authored-by: Jordan Padams <[email protected]>
INFO 🏃♀️ Starting AIP generation for test/data/ladee_test/ladee_mission_bundle/LADEE_Bundle_1101.xml
78
-
INFO 🧾 Writing checksum manifest for /Users/kelly/Documents/Clients/JPL/PDS/Development/pds-deep-archive/test/data/ladee_test/ladee_mission_bundle to ladee_mission_bundle_checksum_manifest_v1.0.tab
79
-
INFO 🚢 Writing transfer manifest for /Users/kelly/Documents/Clients/JPL/PDS/Development/pds-deep-archive/test/data/ladee_test/ladee_mission_bundle to ladee_mission_bundle_transfer_manifest_v1.0.tab
80
-
INFO 🏷 Writing AIP label to ladee_mission_bundle_aip_v1.0.xml
81
-
INFO 🎉 Success! All done, files generated:
82
-
INFO • Checksum manifest: ladee_mission_bundle_checksum_manifest_v1.0.tab
83
-
INFO • Transfer manifest: ladee_mission_bundle_transfer_manifest_v1.0.tab
84
-
INFO • XML label: ladee_mission_bundle_aip_v1.0.xml
85
-
INFO 👋 Thanks for using this program! Bye!
86
-
87
-
3. You can also run sipgen. Here is a basic usage example using data in the test directory::
Specify PDS4 Information Model version to generate
155
-
SIP. Must be 1.13.0.0+; default 1.13.0.0
156
-
157
-
158
69
Documentation
159
70
=============
160
71
161
-
Additional documentation is available in the ``docs`` directory and also TBD.
72
+
Installation and Usage information can be found in the documentation online at https://nasa-pds-incubator.github.io/pds-deep-archive/ or the latest version is maintained under the ``docs`` directory.
INFO 🏃♀️ Starting AIP generation for test/data/ladee_test/ladee_mission_bundle/LADEE_Bundle_1101.xml
18
-
INFO 🧾 Writing checksum manifest for /Users/kelly/Documents/Clients/JPL/PDS/Development/pds-deep-archive/test/data/ladee_test/ladee_mission_bundle to ladee_mission_bundle_checksum_manifest_v1.0.tab
19
-
INFO 🚢 Writing transfer manifest for /Users/kelly/Documents/Clients/JPL/PDS/Development/pds-deep-archive/test/data/ladee_test/ladee_mission_bundle to ladee_mission_bundle_transfer_manifest_v1.0.tab
20
-
INFO 🏷 Writing AIP label to ladee_mission_bundle_aip_v1.0.xml
21
-
INFO 🎉 Success! All done, files generated:
22
-
INFO • Checksum manifest: ladee_mission_bundle_checksum_manifest_v1.0.tab
23
-
INFO • Transfer manifest: ladee_mission_bundle_transfer_manifest_v1.0.tab
24
-
INFO • XML label: ladee_mission_bundle_aip_v1.0.xml
25
-
INFO 👋 Thanks for using this program! Bye!
26
-
27
-
This creates three output files in the current directory as part of the AIP:
26
+
INFO 👟 PDS Deep Archive, version 0.0.0
27
+
INFO 🏃♀️ Starting AIP generation for test/data/ladee_test/mission_bundle/LADEE_Bundle_1101.xml
28
28
29
-
• ``ladee_mission_bundle_checksum_manifest_v1.0.tab``, the checksum manifest
30
-
• ``ladee_mission_bundle_transfer_manifest_v1.0.tab``, the transfer manifest
31
-
• ``ladee_mission_bundle_aip_v1.0.xml``, the label for these two files
29
+
INFO 🎉 Success! AIP done, files generated:
30
+
INFO • Checksum manifest: ladee_mission_bundle_v1.0_checksum_manifest_v1.0.tab
31
+
INFO • Transfer manifest: ladee_mission_bundle_v1.0_transfer_manifest_v1.0.tab
32
+
INFO • XML label for them both: ladee_mission_bundle_v1.0_aip_v1.0.xml
32
33
33
-
The checkum manifest may then be fed into ``sipgen`` to create the SIP::
34
+
INFO 🏃♀️ Starting SIP generation for test/data/ladee_test/mission_bundle/LADEE_Bundle_1101.xml
INFO 🎉 Success! From /Users/jpadams/Documents/proj/pds/pdsen/workspace/pds-deep-archive/test/data/ladee_test/mission_bundle/LADEE_Bundle_1101.xml, generated these output files:
37
+
INFO • SIP Manifest: ladee_mission_bundle_v1.0_sip_v1.0.tab
38
+
INFO • XML label for the SIP: ladee_mission_bundle_v1.0_sip_v1.0.xml
36
39
37
-
This program will print::
40
+
INFO 👋 That's it! Thanks for making an AIP and SIP with us today. Bye!
38
41
39
-
⚙︎ ``sipgen`` — Submission Information Package (SIP) Generator, version 0.0.0
40
-
🎉 Success! From test/data/ladee_test/ladee_mission_bundle/LADEE_Bundle_1101.xml, generated these output files:
41
-
• Manifest: ladee_mission_bundle_sip_v1.0.tab
42
-
• Label: ladee_mission_bundle_sip_v1.0.xml
42
+
This creates 5 output files in the current directory as part of the AIP and SIP Generation:
43
43
44
-
And two new files will appear in the current directory:
44
+
• ``ladee_mission_bundle_v1.0_checksum_manifest_v1.0.tab``, the checksum manifest
45
+
• ``ladee_mission_bundle_v1.0_transfer_manifest_v1.0.tab``, the transfer manifest
46
+
• ``ladee_mission_bundle_v1.0_aip_v1.0.xml``, the label for these two files
45
47
46
-
• ``ladee_mission_bundle_sip_v1.0.tab``, the created SIP manifest as a
48
+
• ``ladee_mission_bundle_v1.0_sip_v1.0.tab``, the created SIP manifest as a
47
49
tab-separated values file.
48
-
• ``ladee_mission_bundle_sip_v1.0.xml``, an PDS label for the SIP file.
49
-
50
-
For reference, the full "usage" message from ``aipgen`` is::
51
-
52
-
usage: aipgen [-h] [-v] IN-BUNDLE.XML
53
-
54
-
Generate an Archive Information Package or AIP. An AIP consists of three
55
-
files: ➀ a "checksum manifest" which contains MD5 hashes of *all* files in a
56
-
product; ➁ a "transfer manifest" which lists the "lidvids" for files within
57
-
each XML label mentioned in a product; and ➂ an XML label for these two files.
58
-
You can use the checksum manifest file ➀ as input to ``sipgen`` in order to
59
-
create a Submission Information Package.
60
-
61
-
positional arguments:
62
-
IN-BUNDLE.XML Root bundle XML file to read
63
-
64
-
optional arguments:
65
-
-h, --help show this help message and exit
66
-
-v, --verbose Verbose logging; defaults False
67
-
68
-
For reference, the full "usage" message from ``sipgen`` follows::
0 commit comments