-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use zlib-ng (fast!) rather than mainline stale zlib in binary releases #91349
Comments
zlib-ng is an optimized zlib library with better performance on most architectures (with contributions from the likes of Google, Cloudflare, and Intel). It is API compatible with zlib. https://github.com/zlib-ng/zlib-ng I believe the only platform we don't use the OS's own zlib on is Windows so I'm tagging this issue Windows. |
if this hasn't happened in the 3.11 betas this is presumably bumped to 3.12. |
I started taking a look at this and it seems like we can build it without having to worry about their build system by renaming the Running test_zlib the only failures seem to be testing for certain failures that no longer fail, but I've got no idea how important they are. Also no indication of the performance impact, or anything else that may change (e.g. new DLL exports, etc.), but it certainly does seem like a feasible drop-in replacement. |
@zooba which failures are these? I have accumulated quite some experience with the zlib/gzip formats due to working on python-isal, bindings for the ISA-L library that speeds up zlib-compatible compression by rewriting the algorithms in x86 Assembly language. It is quite good, but not suitable for a drop-in replacement, hence the python-isal project. Having said that I'd love to help out with anything zlib related in CPython. |
I knew I should've kept better track of the changed error messages 😄 Skimming through the tests, I'm pretty sure |
Checking internally for local patches to zlib-ng, this change zlib-ng/zlib-ng@ce6789c appears to fix that problem. I guess it hasn't landed in a zlib-ng release yet. (I don't see it in the 2.0.x branch) Lets make sure that lands in zlib-ng before using it in our releases I guess. Otherwise: I don't see any Python code using zlib.error that'd care (internally and in a large corpus of third party OSS python code). It is mostly only ever caught to bail out or follows this pattern to try the alternate meaning of wbits for zlib vs gzip format: https://github.com/urllib3/urllib3/blob/main/src/urllib3/response.py#L102 |
I don't recommend bypassing the build system as it is used to determine features of the compiler and what optimizations to enable. If you only use Visual Studio then it is best to use cmake to generate your Visual Studio projects. |
Yeah, I figured this would be a bad idea, but we're also not going to run a separate build system every time. We'll run it once and keep the generated files in our source mirror. (This extra work is why I didn't bother setting it all up earlier, and just hacked things enough to make it build.) |
Is this change intended to remain Windows only? I am interested in packaging zlib_ng for python. (It will ship |
There's nothing stopping you from making zlib-ng wrapping modules on PyPI if you wanted to (check to see if anyone already has first). It'd allow applications on any Python version you provide wheels for to choose to use the faster library without rebuilding anything themselves. Linux distros decide what zlib implementation they use for their own libz so that is not in our control, we don't want to vendor more third party libraries there if we can help it. On macOS I assume we're using a platform zlib as well? But it'd be more reasonable to ship our own different one there as we have to do that for some other libraries such as OpenSSL already. But on Windows, we bring and build it all, so I suggest using zlib-ng on Windows only for starters. zlib-ng is an API drop in compatible replacement for zlib. We switched to chromium's zlib internally at work. It offers a "compatibility mode" that proces bit-identical compressed output to zlib at the cost of some performance. That made it a lot easier to move a huge codebase across tons of different systems over to as there were some things making the incorrect assumption that compressed output is canonical based on the input. If Python runs into that kind of issue during beta releases, I'd suggest we step back and look at the chromium zlib in compatibility mode instead of zlib-ng. Honestly people designing new systems are better off using zstandard for all lossless compression purposes. But the reason zlib exists and is used is compatibility with existing systems that only accept that format as their lowest common denominator compressed data format. |
I checked, no-one yet... At least not on PyPI. As the python-isal maintainer, I also feel I can do this with rather less effort than someone who is not as experienced with python's zlibmodule. So I am going to try.
That is an interesting thing to consider. Now I have to consider which one of those to wrap. Since zlib-ng is also available in conda (in the conda-forge channel) I think that is the most obvious choice for redistribution.
Depends. Intel's ISA-L igzip application compresses faster and at a better compression ratio than zstd can achieve at those speeds. It decompresses slower though. However one complete compress-decompress cycle is faster on igzip than zstd. So for my work (biofinormatics) where a lot of files are run trough multiple programs, this is most optimal. Also it is backwards-compatible with gzip. So it is quite a big win. |
Hi. The python-zlib-ng project is going well: https://github.com/pycompression/python-zlib-ng. First release should be soon. A couple of observations I thought I'd share which have relevance to this issue:
Now these are not deal-brakers when you want a zlib alternative that can be used next to it to enhance performance, but I feel this is less desirable for CPython. A more regularly maintained, stable and predictable library is more appropriate.
I think this is the way to go based on my experience so far. This does give people on windows the least surprise. Of course YMMV but I thought I would add my 2 cents and maybe save someone else some work in trying it out. |
At level 1, our goal is to sacrifice compression ratio for speed and we do this using Intel's quick deflate algorithm. It is possible to disable this algorithm using Has anybody done any compression/decompression performance benchmarking with Chromium's zlib? As far as I remember they weren't focused on improving compression, only decompression. Benchmark from the last release of zlib-ng can be found here, however it only compares against original mainline zlib. |
Ah, that is equivalent to ISA-L's level 0. I see. Thank you for the hint. The problem at the 4th bullet point seems fixes on the latest develop branch though, so I gues I will just wait it out until there is a new release. I think those new strategies are a key feature of zlib-ng so I will leave them in for the python bindings. It does sway my opinion for CPython though. If it can be made more "drop-in" by turning on some options it seems a good fit. Zlib-ng is so easy to build on windows. That can't be said for every C project out there.
That makes sense for a browser, right? It would be a bit weird if compression was a focus. I don't know if browsers regularly sent compressed packets? They do receive them quite a lot. |
I thought I should chime in here, and address the point about spotty releases. That is largely my fault and has to do with some health issues I have been going through, combined with a lot of code refactors and a bad strategy for backporting changes to the stable branch that requires me to do a lot of manual work. There is currently a pre-release for a new 2.0 release zlib-ng/zlib-ng#1393, and it mostly lacks CI fixes so tests work again (The current plan for that is to copy over the new Github Actions files, remove CI runs that are incompatible with 2.0 and hopefully there will not be a lot more to fix manually). That 2.0 release will be the last 2.0 release that we backport changes to, after that there will only be important fixes if needed. The plan is to release a 2.1 beta shortly after the next 2.0 release is out. I fully expect there to be bugs in the current 2.1 development branch due to the large amount of refactors. It has been working perfectly in several production systems for a long time now, but those do not exercise nearly all the various code paths that a full distribution adoption will see. The automated testing for 2.1 is quite extensive though, so I hope we did not miss anything truly important. |
@Dead2 I hope your health issues are resolved soon. Take care!
If I may gave some unsollicitied advice: don't keep a stable branch with backports. It is an extra maintenance burden. The code is only ever going to be as good as its test suite. This means the branch with the most extensive testing is going to be the most suitable for production. Making regular releases from the development head while creating a new test for every bug that is reported is therefore a very sound strategy for stable releases where no regressions occur.
I will run the python-zlib-ng test suite on the branch as well. That has 14000+ compatibility tests with zlib proper, so that also may catch it when something is amiss. |
Issue with test_wbits should be fixed with the next stable release of zlib-ng. |
Just wanted to let you know that we are about to release beta2 of zlib-ng 2.1 shortly (within a day or two unless a problem crops up). No significant bugs were found during beta1, but we did pull in a couple changes that warrant another beta. Also, regarding level1 trading off more speed for lower compression, it can be disabled by setting the compiler define |
zlib-ng sounds like a good idea. We switched to zlib-ng for PNGs in the last Pillow release because it's much faster (up to 4x at highest compression). And for file sizes: zlib-ng has same file sizes for compression level 0 and bigger files for compression level 1, but if you care about file size, you'll use a higher compression which gives more or less the same file size (and zlib-ng is faster). Benchmark graphs at python-pillow/Pillow#8500 (comment) We had a couple of reports from people whose tests started failing, but that was because they were comparing bytes in images, instead of image content/pixels. |
Okay, so let's do it. Do we need to put the sources in the |
I'd love to use the external repo(s) used by builds as we do for zlib. (i think we also have such a repo/branch used by macos builds as well? though not for zlib yet) |
To quote @nmoinvaz
I recommend turning this on. A certain level of compression is expected from zlib level 1. The point of compression is to have smaller files, so having suddenly 30% bigger files may break some people's expectations. Especially in network scenarios this is not desirable. |
We don't; for macOS installer builds, we use the upstream source releases, occasionally with a patch as needed, for the third-party libs that aren't available from the OS or that we want a newer version. We currently use the macOS-supplied version of |
I have mixed feeling about that, because the level 1 compression in zlib is still so slow -around the 50MB/s or less depending on CPU- that it is unusable for many use cases (and it's the main driver of people moving to zstd). Also, please bear in mind that most servers CPU have many cores but the cores are slower (the fastest servers CPUs now are half the speed of the average desktop CPU for single core performance); while on the user front, many laptops have dramatically increased the CPU count and added efficiency cores, but the cores barely break the 1 GHz mark anymore. Performance has got worse in the last few years. zlib is heavily affected as it's single threaded compression. It would be nice to have a compress level 1 that's a lot faster! For reference, I just got patches shipped to tarfile few months ago to make tarfile twice as fast. Even after that, the compression is still too slow at level 1 for trivial usage to compress files and transfer between machines. (I can't change the algo because spark can only understand zip or tar.gz) #121269 (comment) |
for now, this only exposes the crc32 function.
In that case use python-isal. If a specialized use case warrants the use of extra fast zlib, that is much better. In this case where python's zlib module backend is changed, I think the changes in behavior should be minimal.
Not if your network is "the internet". Then it really matters whether you sent a 30MiB or a 40MiB file. Also if you need fast, but reasonable compression to save disk space, those differences are quite huge. Therefore I think the ultra-fast strategy in zlib-ng should be disabled as it is not infeasible that this strategy will lead to unexpected nasty surprises for some users on windows CPython. For applications that do need the utmost speed and stuff python-isal and python-zlib-ng are available. Python-zlib-ng as a non-standard library component of course has the default zlib-ng behavior for level 1. |
We at zlib-ng think that if you already chose level 1, then you probably cared more about the speed than the compression ratio. Also, if you are interested in fast CRC, our current develop contains a much faster CRC implementation called Chorba, so make sure to test that as well. These are my benchmarks of C-Chorba vs the old braid implementation as of today on a 2011-era low-end AMD E-450 cpu.
|
Alright, I recall I had to patch tornado many years ago to fix the compression level. That reminds me, there is a massive bug in the python standard libraries around gzip, python is using level 9 compress by default, which is extremely slow (around 1MB/s). I should have sent the fix 9 years ago to fix the level, when I was fixing tornado, but then I forgot ^^ |
…n gzip tarfile and bzip. It is the default level used by most compression tools and a reasonable tradeoff between speed and performance. level 9 is very slow, in the order of 1 MB/s, vary with the machine, and was likely a major bottleneck for any software that forgot to specify the compression level.
@ned-deily Make sure you enable zlib compatibility mode and probably disable the gtest and gzip stuff and it really does just drop in now (I did a bit more work because I didn't want to be running CMake as part of our build, but the only change I made to zlibmodule was to add ZLIBNG_VERSION). |
Compression level 1 is already labeled "best speed", which to me says we should use the option that provides the best speed. Provided it isn't making data bigger, it is doing exactly what it claims to, and would be inaccurate if we didn't let it run with the best speed. I assume that switching to compression level 2 is going to be fairly close to the old compression level 1? And if you didn't actually mean "best speed", then selecting an option other than "best speed" seems like a reasonable way to express that. Changing the default from 9 to something else is likely fine, and we may as well do it at the same time. Though I'd like to see a more concrete argument than "probably a better CPU/size tradeoff". I don't see any size benchmarks above covering level 6, and I'm not even sure what kind of files would be a reasonable benchmark here - probably the type of content likely to be served from a Python web app? JSON? Wheels get packed once and extracted millions of times, so the size benefit easily outweighs the compression time. |
If deflate_quick is enabled ( If you do decide disable deflate_quick, I'd still suggest you keep deflate_medium enabled. If so, instead of setting |
It is faster than the crc32 function from the zlib library. Update zipfile to detect and use lzma.crc32 zlic.crc32 binascii.crc32, in order of preference.
@zooba quick benchmark on compression level. trying on the json file https://pypi.org/pypi/numpy/json level above 6 have zero benefit, they are significantly slower and they don't compress better. that new laptop is doing good, gzip used to be single digit MB/s in the higher compression levels!
import pathlib
import gzip
import time
import zlib
data = pathlib.Path('numpy.json').read_bytes()
size = len(data)
for level in range(0, 10):
start = time.perf_counter()
compressed = gzip.compress(data, level)
end = time.perf_counter()
elapsed = end - start
print(f"level={level} {len(data)//1024}kB->{len(compressed)//1024}kB in {elapsed:.3f} seconds {len(data)//1048576/elapsed:.2f} MB/s {len(compressed)/len(data)*100.0:.2f}% size") EDIT: adding one run on bz2. I see bz2 is set to level=9 too. I guess we can keep it like that. the default for bzip tools seems to be level 9.
|
…n gzip and tarfile It is the default level used by most compression tools and a better tradeoff between speed and performance.
i suggest not overthinking exactly 1 means for now, we can tune that if it causes problems during the beta period. |
@morotti Thanks for those! Yeah, moving to 6 looks about right based on that, though it's broader than the implementation in the Windows distribution, so I'll look for others to chime in on it. Leaving this issue open for now until @ned-deily decides on the macOS build and we answer the default level question. |
quick benchmark on the compression on numpy.json (see script few comments above) windows laptop, turbo boost off quick takeaways:
I think we can all agree that zlib is obsolete. can we start using |
FYI, I did some testing many years ago and determined that 128k sized buffers were most optimal for performance. zlib-ng/zlib-ng#868 |
hello, zlib-ng started being used for the windows build a week ago and it's awesome :) |
Contributions to the linux build as in configure.ac & Makefile.pre.in are welcome so that when a recent enough version of zlib-ng is found, it is preferred over vanilla zlib. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: