-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zstd support for faster tarball creation or extraction in eessi_container.sh #994
Add zstd support for faster tarball creation or extraction in eessi_container.sh #994
Conversation
…ume and/or --save is used and zstd is avaialble
Instance
|
Instance
|
Instance
|
Instance
|
Instance
|
Hmmm, I'm not sure if I need more of the changes from #903 to make things work on Snellius. Nor if I can actually build some CPU-only stuff there. But let's try :) bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
This reverts commit 1b408d3.
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
This reverts commit 17cff91.
bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
New job on instance
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'll give it a try later today.
if [[ -d ${SAVE} ]]; then | ||
# assume SAVE is name of a directory to which tarball shall be written to | ||
# name format: tmp_storage-{TIMESTAMP}.tgz | ||
ts=$(date +%s) | ||
TGZ=${SAVE}/tmp_storage-${ts}.tgz | ||
if [[ -x "$(command -v zstd)" ]]; then | ||
TARBALL=${SAVE}/tmp_storage-${ts}.zst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is .zst
the standard extensions for zstd-compressed tarballs?
I've never seen that anywhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is. https://en.wikipedia.org/wiki/Zstd
I believe if your tar is build with zstd support, you can do -xaf to inflate and it will infer from the extension that it needs to use zstd. But: I guess you could have zstd on your system and not have your tar compiled with zstd support, so my current implementation is more portable.
if [[ "${RESUME}" == *.tgz ]]; then | ||
tar xf ${RESUME} -C ${EESSI_HOST_STORAGE} | ||
# Add support for resuming from zstd-compressed tarballs | ||
elif [[ "${RESUME}" == *.zst && -x "$(command -v zstd)" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check once if zstd
is supported, define a constant, and use that everywhere:
if [[ -x $(command -v zstd) ]]; then
USE_ZSTD=true
else
USE_ZSTD=false
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, you can, but then you have to check if [[ "${USE_ZSTD}" == 'true' ]]
(remember, this is bash, not python, there is no boolean in bash.
Since you have to do a test anyway, the current is explicit, equally short, and execution time of command -v
is negligible. No point in storing it in an environment variable, then having to remember what the value to check for is (true or True or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as intended.
Tested with tarballs of about 14 GB size. Resuming with gzip2 took about 3-4 minutes, saving about 12-13 minutes. With zstd
resuming took about 2 minutes, saving about 2-3 minutes.
Nice! 👍
4623765
into
EESSI:2023.06-software.eessi.io
PR merged! Moved |
1 similar comment
PR merged! Moved |
PR merged! Moved |
PR merged! Moved |
When --resume and/or --save is used and zstd is available.
The builds in this PR show the benefit on a system that has zstd available. For SAMtools, the total time for the build job goes down from 12 to 4 minutes (tarballs for saving the container state of the build_step are something like 1.7GB for this build). For pmt, it goes down from 22 to 3.5 minutes (tarballs for saving the container state of the build step are around 3.2 GB for this build). Note also that the tarball size goes down (2.7 GB for pmt for the build_step), so
zstd
is not only faster, it also compresses better. Could also be nice for our file size footprint on the AWS storage.