Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use zlib-ng (fast!) rather than mainline stale zlib in binary releases #91349

Open
gpshead opened this issue Apr 1, 2022 · 55 comments
Open

Use zlib-ng (fast!) rather than mainline stale zlib in binary releases #91349

gpshead opened this issue Apr 1, 2022 · 55 comments
Labels
build The build process and cross-build OS-mac OS-windows performance Performance or resource usage type-feature A feature request or enhancement

Comments

@gpshead
Copy link
Member

gpshead commented Apr 1, 2022

BPO 47193
Nosy @gpshead, @pfmoore, @tjguk, @zware, @zooba, @corona10, @arhadthedev

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-04-01.19:19:24.173>
labels = ['3.11', 'OS-windows', 'performance']
title = 'Use zlib-ng rather than zlib in binary releases'
updated_at = <Date 2022-04-02.09:48:33.061>
user = 'https://github.com/gpshead'

bugs.python.org fields:

activity = <Date 2022-04-02.09:48:33.061>
actor = 'corona10'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Windows']
creation = <Date 2022-04-01.19:19:24.173>
creator = 'gregory.p.smith'
dependencies = []
files = []
hgrepos = []
issue_num = 47193
keywords = []
message_count = 1.0
messages = ['416508']
nosy_count = 7.0
nosy_names = ['gregory.p.smith', 'paul.moore', 'tim.golden', 'zach.ware', 'steve.dower', 'corona10', 'arhadthedev']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue47193'
versions = ['Python 3.11']

Linked PRs

@gpshead
Copy link
Member Author

gpshead commented Apr 1, 2022

zlib-ng is an optimized zlib library with better performance on most architectures (with contributions from the likes of Google, Cloudflare, and Intel). It is API compatible with zlib. https://github.com/zlib-ng/zlib-ng

I believe the only platform we don't use the OS's own zlib on is Windows so I'm tagging this issue Windows.

@gpshead gpshead added 3.11 only security fixes OS-windows performance Performance or resource usage labels Apr 1, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@gpshead
Copy link
Member Author

gpshead commented Jun 17, 2022

if this hasn't happened in the 3.11 betas this is presumably bumped to 3.12.

@zooba
Copy link
Member

zooba commented Jul 17, 2022

I started taking a look at this and it seems like we can build it without having to worry about their build system by renaming the zconf.h.in and zconf-ng.h.in files to remove the .in, and setting ZLIB_COMPAT preprocessor in pythoncore.vcxproj (as well as referencing the files).

Running test_zlib the only failures seem to be testing for certain failures that no longer fail, but I've got no idea how important they are. Also no indication of the performance impact, or anything else that may change (e.g. new DLL exports, etc.), but it certainly does seem like a feasible drop-in replacement.

@zooba zooba added 3.12 only security fixes and removed 3.11 only security fixes labels Jul 17, 2022
@rhpvorderman
Copy link
Contributor

@zooba which failures are these? I have accumulated quite some experience with the zlib/gzip formats due to working on python-isal, bindings for the ISA-L library that speeds up zlib-compatible compression by rewriting the algorithms in x86 Assembly language. It is quite good, but not suitable for a drop-in replacement, hence the python-isal project.

Having said that I'd love to help out with anything zlib related in CPython.

@zooba
Copy link
Member

zooba commented Aug 1, 2022

I knew I should've kept better track of the changed error messages 😄

Skimming through the tests, I'm pretty sure test_wbits failed in a few places because we expected some "invalid window size" errors that aren't errors with zlib-ng. I don't think any were critical, but I don't know the usage of the library well enough to be sure (e.g. do people expect these errors and change behaviour? or are they always just developer error and cause changes to just make things work?)

@gpshead
Copy link
Member Author

gpshead commented Aug 1, 2022

Checking internally for local patches to zlib-ng, this change zlib-ng/zlib-ng@ce6789c appears to fix that problem. I guess it hasn't landed in a zlib-ng release yet. (I don't see it in the 2.0.x branch)

Lets make sure that lands in zlib-ng before using it in our releases I guess.

Otherwise: I don't see any Python code using zlib.error that'd care (internally and in a large corpus of third party OSS python code). It is mostly only ever caught to bail out or follows this pattern to try the alternate meaning of wbits for zlib vs gzip format: https://github.com/urllib3/urllib3/blob/main/src/urllib3/response.py#L102

@nmoinvaz
Copy link

nmoinvaz commented Aug 1, 2022

I started taking a look at this and it seems like we can build it without having to worry about their build system by renaming the zconf.h.in and zconf-ng.h.in files to remove the .in, and setting ZLIB_COMPAT preprocessor in pythoncore.vcxproj (as well as referencing the files).

I don't recommend bypassing the build system as it is used to determine features of the compiler and what optimizations to enable. If you only use Visual Studio then it is best to use cmake to generate your Visual Studio projects.

@zooba
Copy link
Member

zooba commented Aug 1, 2022

I don't recommend bypassing the build system

Yeah, I figured this would be a bad idea, but we're also not going to run a separate build system every time. We'll run it once and keep the generated files in our source mirror. (This extra work is why I didn't bother setting it all up earlier, and just hacked things enough to make it build.)

@rhpvorderman
Copy link
Contributor

Is this change intended to remain Windows only? I am interested in packaging zlib_ng for python. (It will ship zlib_ng and gzip_ng modules and be fully compatible in every way, at least that is the plan).
But this only makes sense if CPython is not going to switch to using zlib_ng by default any time soon. @gpshead if I understand you correctly, this change is merely for complete distributions that do not dynamically link other libraries at runtime?

@gpshead
Copy link
Member Author

gpshead commented Jan 20, 2023

There's nothing stopping you from making zlib-ng wrapping modules on PyPI if you wanted to (check to see if anyone already has first). It'd allow applications on any Python version you provide wheels for to choose to use the faster library without rebuilding anything themselves.

Linux distros decide what zlib implementation they use for their own libz so that is not in our control, we don't want to vendor more third party libraries there if we can help it. On macOS I assume we're using a platform zlib as well? But it'd be more reasonable to ship our own different one there as we have to do that for some other libraries such as OpenSSL already.

But on Windows, we bring and build it all, so I suggest using zlib-ng on Windows only for starters.

zlib-ng is an API drop in compatible replacement for zlib.
As is the zlib implementation you can find in the Chromium repository: https://chromium.googlesource.com/chromium/src/third_party/zlib/.
Both are much faster than upstream stock zlib. (and better maintained)

We switched to chromium's zlib internally at work. It offers a "compatibility mode" that proces bit-identical compressed output to zlib at the cost of some performance. That made it a lot easier to move a huge codebase across tons of different systems over to as there were some things making the incorrect assumption that compressed output is canonical based on the input.

If Python runs into that kind of issue during beta releases, I'd suggest we step back and look at the chromium zlib in compatibility mode instead of zlib-ng.


Honestly people designing new systems are better off using zstandard for all lossless compression purposes. But the reason zlib exists and is used is compatibility with existing systems that only accept that format as their lowest common denominator compressed data format.

@gpshead gpshead changed the title Use zlib-ng rather than zlib in binary releases Use zlib-ng or chromium's zlib rather than mainline stale zlib in binary releases Jan 20, 2023
@rhpvorderman
Copy link
Contributor

There's nothing stopping you from making zlib-ng wrapping modules on PyPI if you wanted to (check to see if anyone already has first).

I checked, no-one yet... At least not on PyPI. As the python-isal maintainer, I also feel I can do this with rather less effort than someone who is not as experienced with python's zlibmodule. So I am going to try.

zlib-ng is an API drop in compatible replacement for zlib.
As is the zlib implementation you can find in the Chromium repository: https://chromium.googlesource.com/chromium/src/third_party/zlib/.
Both are much faster than upstream stock zlib. (and better maintained)

We switched to chromium's zlib internally at work. It offers a "compatibility mode" that proces bit-identical compressed output to zlib at the cost of some performance. That made it a lot easier to move a huge codebase across tons of different systems over to as there were some things making the incorrect assumption that compressed output is canonical based on the input.

If Python runs into that kind of issue during beta releases, I'd suggest we step back and look at the chromium zlib in compatibility mode instead of zlib-ng.

That is an interesting thing to consider. Now I have to consider which one of those to wrap. Since zlib-ng is also available in conda (in the conda-forge channel) I think that is the most obvious choice for redistribution.

Honestly people designing new systems are better off using zstandard for all lossless compression purposes.

Depends. Intel's ISA-L igzip application compresses faster and at a better compression ratio than zstd can achieve at those speeds. It decompresses slower though. However one complete compress-decompress cycle is faster on igzip than zstd. So for my work (biofinormatics) where a lot of files are run trough multiple programs, this is most optimal. Also it is backwards-compatible with gzip. So it is quite a big win.
For most other use cases though, I am inclined to agree that zstd is much better. Especially when that will get its own Assembly implementation (if that will happen).

@rhpvorderman
Copy link
Contributor

Hi. The python-zlib-ng project is going well: https://github.com/pycompression/python-zlib-ng. First release should be soon.

A couple of observations I thought I'd share which have relevance to this issue:

  • zlib-ng is a charm to build on Windows if CMake is available. Very easy.
  • zlib-ng is not as drop-in as it name suggests: compression level 1 is vastly different, sacrificing compression ratio for speed. (Files can be 50% bigger than files compressed with zlib). In my opinion it is quite unacceptable for python's zlib module to have such a big change in behaviour between releases. A couple of percent is doable, but this is just not nice. Especially since the goal of compression is to have less data.
  • zlib-ng is volunteer maintained and therefore has not had a stable release in more than a year. Releases are spotty at best. This would be fine, if it were not for the following point:
  • There is a bug in the current 2.0.6 release where the wbits parameter is effectively ignored at compression level 1. It is set to be at least 13, but this creates an issue where data compressed with wbits=9 can't be decompressed with wbits=9. This bug has already been fixed for a while, but still not released.

Now these are not deal-brakers when you want a zlib alternative that can be used next to it to enhance performance, but I feel this is less desirable for CPython. A more regularly maintained, stable and predictable library is more appropriate.

If Python runs into that kind of issue during beta releases, I'd suggest we step back and look at the chromium zlib in compatibility mode instead of zlib-ng.

I think this is the way to go based on my experience so far. This does give people on windows the least surprise. Of course YMMV but I thought I would add my 2 cents and maybe save someone else some work in trying it out.

@nmoinvaz
Copy link

nmoinvaz commented Jan 30, 2023

zlib-ng is not as drop-in as it name suggests: compression level 1 is vastly different, sacrificing compression ratio for speed. (Files can be 50% bigger than files compressed with zlib). In my opinion it is quite unacceptable for python's zlib module to have such a big change in behaviour between releases. A couple of percent is doable, but this is just not nice. Especially since the goal of compression is to have less data.

At level 1, our goal is to sacrifice compression ratio for speed and we do this using Intel's quick deflate algorithm. It is possible to disable this algorithm using -D WITH_NEW_STRATEGIES=OFF. This should also solve your 4th bullet point about the bug.

Has anybody done any compression/decompression performance benchmarking with Chromium's zlib? As far as I remember they weren't focused on improving compression, only decompression. Benchmark from the last release of zlib-ng can be found here, however it only compares against original mainline zlib.

@rhpvorderman
Copy link
Contributor

rhpvorderman commented Jan 31, 2023

At level 1, our goal is to sacrifice compression ratio for speed and we do this using Intel's quick deflate algorithm. It is possible to disable this algorithm using -D WITH_NEW_STRATEGIES=OFF. This should also solve your 4th bullet point about the bug.

Ah, that is equivalent to ISA-L's level 0. I see. Thank you for the hint. The problem at the 4th bullet point seems fixes on the latest develop branch though, so I gues I will just wait it out until there is a new release. I think those new strategies are a key feature of zlib-ng so I will leave them in for the python bindings.

It does sway my opinion for CPython though. If it can be made more "drop-in" by turning on some options it seems a good fit. Zlib-ng is so easy to build on windows. That can't be said for every C project out there.

As far as I remember they weren't focused on improving compression, only decompression.

That makes sense for a browser, right? It would be a bit weird if compression was a focus. I don't know if browsers regularly sent compressed packets? They do receive them quite a lot.

@Dead2
Copy link

Dead2 commented Feb 2, 2023

I thought I should chime in here, and address the point about spotty releases. That is largely my fault and has to do with some health issues I have been going through, combined with a lot of code refactors and a bad strategy for backporting changes to the stable branch that requires me to do a lot of manual work.

There is currently a pre-release for a new 2.0 release zlib-ng/zlib-ng#1393, and it mostly lacks CI fixes so tests work again (The current plan for that is to copy over the new Github Actions files, remove CI runs that are incompatible with 2.0 and hopefully there will not be a lot more to fix manually). That 2.0 release will be the last 2.0 release that we backport changes to, after that there will only be important fixes if needed.

The plan is to release a 2.1 beta shortly after the next 2.0 release is out.
Hopefully the zlib-ng core codebase is now stable enough (in terms of not needing more big refactors) that we can do a more rolling kind of releases for 2.1.x. If that pans out, we will have more regular releases, each with a smaller amount of changes and less chances for breakage between each release. That way there will also be little need for backporting.

I fully expect there to be bugs in the current 2.1 development branch due to the large amount of refactors. It has been working perfectly in several production systems for a long time now, but those do not exercise nearly all the various code paths that a full distribution adoption will see. The automated testing for 2.1 is quite extensive though, so I hope we did not miss anything truly important.

@rhpvorderman
Copy link
Contributor

@Dead2 I hope your health issues are resolved soon. Take care!
Also thanks a lot for zlib-ng. It is quite an awesome library. I am especially awed with how easy it was to build and integrate for a library that contains various optimizations for various architectures.

Hopefully the zlib-ng core codebase is now stable enough (in terms of not needing more big refactors) that we can do a more rolling kind of releases for 2.1.x. If that pans out, we will have more regular releases, each with a smaller amount of changes and less chances for breakage between each release. That way there will also be little need for backporting.

If I may gave some unsollicitied advice: don't keep a stable branch with backports. It is an extra maintenance burden. The code is only ever going to be as good as its test suite. This means the branch with the most extensive testing is going to be the most suitable for production. Making regular releases from the development head while creating a new test for every bug that is reported is therefore a very sound strategy for stable releases where no regressions occur.

The automated testing for 2.1 is quite extensive though, so I hope we did not miss anything truly important.

I will run the python-zlib-ng test suite on the branch as well. That has 14000+ compatibility tests with zlib proper, so that also may catch it when something is amiss.

@rhpvorderman
Copy link
Contributor

Issue with test_wbits should be fixed with the next stable release of zlib-ng.

@Dead2
Copy link

Dead2 commented May 16, 2023

Just wanted to let you know that we are about to release beta2 of zlib-ng 2.1 shortly (within a day or two unless a problem crops up). No significant bugs were found during beta1, but we did pull in a couple changes that warrant another beta.

Also, regarding level1 trading off more speed for lower compression, it can be disabled by setting the compiler define -DNO_QUICK_STRATEGY if you want the compression ratio to stay more in line with stock zlib.
We do not have a configure/CMake option for disabling only deflate_quick, and none is planned.

@hugovk
Copy link
Member

hugovk commented Mar 18, 2025

zlib-ng sounds like a good idea.

We switched to zlib-ng for PNGs in the last Pillow release because it's much faster (up to 4x at highest compression).

And for file sizes: zlib-ng has same file sizes for compression level 0 and bigger files for compression level 1, but if you care about file size, you'll use a higher compression which gives more or less the same file size (and zlib-ng is faster).

Benchmark graphs at python-pillow/Pillow#8500 (comment)

We had a couple of reports from people whose tests started failing, but that was because they were comparing bytes in images, instead of image content/pixels.

@gpshead gpshead changed the title Use zlib-ng or chromium's zlib rather than mainline stale zlib in binary releases Use zlib-ng (fast!) rather than mainline stale zlib in binary releases Mar 18, 2025
@zooba
Copy link
Member

zooba commented Mar 18, 2025

Okay, so let's do it.

Do we need to put the sources in the cpython-source-deps repository (where zlib currently lives), or do we prefer to vendor directly in our repo?

@gpshead
Copy link
Member Author

gpshead commented Mar 18, 2025

I'd love to use the external repo(s) used by builds as we do for zlib. (i think we also have such a repo/branch used by macos builds as well? though not for zlib yet)

@rhpvorderman
Copy link
Contributor

zlib-ng level 1 is compressing a lot worse AND a lot faster.

This isn't our default setting though, right? If it is, I think that's a problem, but if it's non-default then not a problem.

correct. 1 is just what people explicitly use when they want "fastest" with zlib. I wouldn't pre-worry about definitions of 1 being
different any more than output differing between implementations.

To quote @nmoinvaz

At level 1, our goal is to sacrifice compression ratio for speed and we do this using Intel's quick deflate algorithm. It is possible to disable this algorithm using -D WITH_NEW_STRATEGIES=OFF. This should also solve your 4th bullet point about the bug.

I recommend turning this on. A certain level of compression is expected from zlib level 1. The point of compression is to have smaller files, so having suddenly 30% bigger files may break some people's expectations. Especially in network scenarios this is not desirable.

@ned-deily
Copy link
Member

(i think we also have such a repo/branch used by macos builds as well? though not for zlib yet)

We don't; for macOS installer builds, we use the upstream source releases, occasionally with a patch as needed, for the third-party libs that aren't available from the OS or that we want a newer version. We currently use the macOS-supplied version of zlib, although we have supplied our own in the distant past. It sounds like it shouldn't be a big issue to use zlib-ng instead. I'll look into it and, if it looks good, I'll make a PR.

@morotti
Copy link
Contributor

morotti commented Mar 19, 2025

At level 1, our goal is to sacrifice compression ratio for speed and we do this using Intel's quick deflate algorithm. It is possible to disable this algorithm using -D WITH_NEW_STRATEGIES=OFF. This should also solve your 4th bullet point about the bug.

I recommend turning this on. A certain level of compression is expected from zlib level 1. The point of compression is to have smaller files, so having suddenly 30% bigger files may break some people's expectations. Especially in network scenarios this is not desirable.

I have mixed feeling about that, because the level 1 compression in zlib is still so slow -around the 50MB/s or less depending on CPU- that it is unusable for many use cases (and it's the main driver of people moving to zstd).
Anything like compressing a backup or a container images or a large file, is severely limited by the compression speed at level 1. It's slower than the disk can read and write. It's slower than the network (any gigabit ethernet or any wireless LAN).

Also, please bear in mind that most servers CPU have many cores but the cores are slower (the fastest servers CPUs now are half the speed of the average desktop CPU for single core performance); while on the user front, many laptops have dramatically increased the CPU count and added efficiency cores, but the cores barely break the 1 GHz mark anymore. Performance has got worse in the last few years. zlib is heavily affected as it's single threaded compression.

It would be nice to have a compress level 1 that's a lot faster!

For reference, I just got patches shipped to tarfile few months ago to make tarfile twice as fast. Even after that, the compression is still too slow at level 1 for trivial usage to compress files and transfer between machines. (I can't change the algo because spark can only understand zip or tar.gz) #121269 (comment)

morotti pushed a commit to man-group/cpython that referenced this issue Mar 19, 2025
for now, this only exposes the crc32 function.
@rhpvorderman
Copy link
Contributor

Anything like compressing a backup or a container images or a large file, is severely limited by the compression speed at level 1. It's slower than the disk can read and write. It's slower than the network (any gigabit ethernet or any wireless LAN).

In that case use python-isal. If a specialized use case warrants the use of extra fast zlib, that is much better.

In this case where python's zlib module backend is changed, I think the changes in behavior should be minimal.

It's slower than the network (any gigabit ethernet or any wireless LAN).

Not if your network is "the internet". Then it really matters whether you sent a 30MiB or a 40MiB file. Also if you need fast, but reasonable compression to save disk space, those differences are quite huge. Therefore I think the ultra-fast strategy in zlib-ng should be disabled as it is not infeasible that this strategy will lead to unexpected nasty surprises for some users on windows CPython.

For applications that do need the utmost speed and stuff python-isal and python-zlib-ng are available. Python-zlib-ng as a non-standard library component of course has the default zlib-ng behavior for level 1.

@Dead2
Copy link

Dead2 commented Mar 19, 2025

We at zlib-ng think that if you already chose level 1, then you probably cared more about the speed than the compression ratio.
Some might get surprised, but having this much faster level 1 (nearly 2x the speed of level 2) also means that people can keep using deflate/gzip compression instead of having to migrate (if at all possible) to LZO/LZ4 that would require rewrites and breaking compatibility.

Also, if you are interested in fast CRC, our current develop contains a much faster CRC implementation called Chorba, so make sure to test that as well.
Currently only a C-Chorba is implemented, but work is ongoing on SSE2 and others as well. Chorba is not used by cpus with [V]PCLMULQDQ capabilities, as those are already faster with implementations utilizing those instructions. But Chorba is great for portability since it does not require any specific instruction sets.

These are my benchmarks of C-Chorba vs the old braid implementation as of today on a 2011-era low-end AMD E-450 cpu.

crc32/generic_chorba/1                          26.9 ns         26.8 ns     26157499
crc32/generic_chorba/8                          50.1 ns         49.9 ns     14035938
crc32/generic_chorba/12                         80.6 ns         80.3 ns      8719313
crc32/generic_chorba/16                         94.7 ns         94.3 ns      7426048
crc32/generic_chorba/32                          153 ns          153 ns      4584211
crc32/generic_chorba/64                          280 ns          279 ns      2500184
crc32/generic_chorba/512                         730 ns          726 ns       967636
crc32/generic_chorba/4096                       3772 ns         3755 ns       186302
crc32/generic_chorba/32768                     30684 ns        30535 ns        22923
crc32/generic_chorba/262144                   224062 ns       222951 ns         3140
crc32/generic_chorba/4194304                 3478228 ns      3404615 ns          206

crc32/braid/1                                   18.0 ns         17.9 ns     39011495
crc32/braid/8                                   44.0 ns         43.8 ns     15986689
crc32/braid/12                                  59.1 ns         58.9 ns     11871923
crc32/braid/16                                  73.2 ns         72.9 ns      9602697
crc32/braid/32                                   132 ns          132 ns      5319085
crc32/braid/64                                   260 ns          259 ns      2707563
crc32/braid/512                                 1283 ns         1277 ns       548611
crc32/braid/4096                                9249 ns         9210 ns        75989
crc32/braid/32768                              73471 ns        73154 ns         9567
crc32/braid/262144                            586380 ns       583859 ns         1198
crc32/braid/4194304                          9649556 ns      9563683 ns           73

@morotti
Copy link
Contributor

morotti commented Mar 19, 2025

Alright, I recall I had to patch tornado many years ago to fix the compression level.
I am reviewing the source code and still found the old comment https://github.com/tornadoweb/tornado/blob/d5ac65c1f1453c2aeddd089d8e68c159645c13e1/tornado/web.py#L3247

That reminds me, there is a massive bug in the python standard libraries around gzip, python is using level 9 compress by default, which is extremely slow (around 1MB/s).
Anything that is python with the default settings will find itself massively bottlenecked by the compression.
The default level used by all the tools is level 6, which is a more reasonable tradeoff.

I should have sent the fix 9 years ago to fix the level, when I was fixing tornado, but then I forgot ^^
Is there any objection if I send a patch to change the default compression level to 6 to tarfile gzip and co?

morotti pushed a commit to man-group/cpython that referenced this issue Mar 19, 2025
…n gzip tarfile and

bzip.

It is the default level used by most compression tools and a
reasonable tradeoff between speed and performance. level 9 is very slow, in
the order of 1 MB/s, vary with the machine, and was likely a major
bottleneck for any software that forgot to specify the compression level.
@zooba
Copy link
Member

zooba commented Mar 19, 2025

@ned-deily Make sure you enable zlib compatibility mode and probably disable the gtest and gzip stuff and it really does just drop in now (I did a bit more work because I didn't want to be running CMake as part of our build, but the only change I made to zlibmodule was to add ZLIBNG_VERSION).

@zooba
Copy link
Member

zooba commented Mar 19, 2025

Compression level 1 is already labeled "best speed", which to me says we should use the option that provides the best speed. Provided it isn't making data bigger, it is doing exactly what it claims to, and would be inaccurate if we didn't let it run with the best speed.

I assume that switching to compression level 2 is going to be fairly close to the old compression level 1? And if you didn't actually mean "best speed", then selecting an option other than "best speed" seems like a reasonable way to express that.

Changing the default from 9 to something else is likely fine, and we may as well do it at the same time. Though I'd like to see a more concrete argument than "probably a better CPU/size tradeoff".

I don't see any size benchmarks above covering level 6, and I'm not even sure what kind of files would be a reasonable benchmark here - probably the type of content likely to be served from a Python web app? JSON? Wheels get packed once and extracted millions of times, so the size benefit easily outweighs the compression time.

@Dead2
Copy link

Dead2 commented Mar 19, 2025

I assume that switching to compression level 2 is going to be fairly close to the old compression level 1? And if you didn't actually mean "best speed", then selecting an option other than "best speed" seems like a reasonable way to express that.

If deflate_quick is enabled (WITH_NEW_STRATEGIES=ON), then level 2 uses the settings level 1 normally uses. So in other words, zlib-ng level 2 with deflate_quick is exactly the same as level 1 is without deflate_quick.
In both cases, this is not exactly the same as zlib level 1 due to changes needed for better optimization, but very close.
Ref: https://github.com/zlib-ng/zlib-ng/blob/fd0d263cedab1a136f40d65199987e3eaeecfcbd/deflate.c#L146

If you do decide disable deflate_quick, I'd still suggest you keep deflate_medium enabled. If so, instead of setting WITH_NEW_STRATEGIES=OFF, you'd need to define NO_QUICK_STRATEGY as a CFLAG.
Compression with deflate_medium is very close to the original levels, just faster.

morotti pushed a commit to man-group/cpython that referenced this issue Mar 19, 2025
It is faster than the crc32 function from the zlib library.
Update zipfile to detect and use lzma.crc32
zlic.crc32 binascii.crc32, in order of preference.
@morotti
Copy link
Contributor

morotti commented Mar 19, 2025

@zooba quick benchmark on compression level.

trying on the json file https://pypi.org/pypi/numpy/json
python 3.14 beta on main branch. windows laptop, turbo boost off.

level above 6 have zero benefit, they are significantly slower and they don't compress better.
we can benchmark various types of data, the result will be similar. (there are plenty of zlib benchmark online if you want to see)

that new laptop is doing good, gzip used to be single digit MB/s in the higher compression levels!
it's so slow that I had to set web servers like tornado to compress on level=1 in multiple of my day jobs, otherwise they spend more time compressing response than running the app.

gzip level=0 2331kB->2331kB in 0.005 seconds 438.96 MB/s 100.01% size
gzip level=1 2331kB->523kB in 0.037 seconds 53.88 MB/s 22.48% size
gzip level=2 2331kB->513kB in 0.036 seconds 56.11 MB/s 22.02% size
gzip level=3 2331kB->508kB in 0.039 seconds 51.37 MB/s 21.81% size
gzip level=4 2331kB->477kB in 0.065 seconds 30.64 MB/s 20.50% size
gzip level=5 2331kB->468kB in 0.074 seconds 26.95 MB/s 20.11% size
gzip level=6 2331kB->459kB in 0.088 seconds 22.64 MB/s 19.71% size
gzip level=7 2331kB->456kB in 0.098 seconds 20.51 MB/s 19.60% size
gzip level=8 2331kB->453kB in 0.139 seconds 14.37 MB/s 19.44% size
gzip level=9 2331kB->453kB in 0.147 seconds 13.62 MB/s 19.44% size
import pathlib
import gzip
import time
import zlib

data = pathlib.Path('numpy.json').read_bytes()
size = len(data)
for level in range(0, 10):
    start = time.perf_counter()
    compressed = gzip.compress(data, level)
    end = time.perf_counter()
    elapsed = end - start
    print(f"level={level} {len(data)//1024}kB->{len(compressed)//1024}kB in {elapsed:.3f} seconds {len(data)//1048576/elapsed:.2f} MB/s {len(compressed)/len(data)*100.0:.2f}% size")

EDIT: adding one run on bz2. I see bz2 is set to level=9 too. I guess we can keep it like that. the default for bzip tools seems to be level 9.

bz2 level=1 2331kB->466kB in 0.312 seconds 6.41 MB/s 20.01% size
bz2 level=2 2331kB->451kB in 0.332 seconds 6.02 MB/s 19.38% size
bz2 level=3 2331kB->446kB in 0.338 seconds 5.91 MB/s 19.14% size
bz2 level=4 2331kB->442kB in 0.351 seconds 5.70 MB/s 18.99% size
bz2 level=5 2331kB->440kB in 0.358 seconds 5.58 MB/s 18.90% size
bz2 level=6 2331kB->439kB in 0.375 seconds 5.33 MB/s 18.84% size
bz2 level=7 2331kB->438kB in 0.371 seconds 5.39 MB/s 18.83% size
bz2 level=8 2331kB->437kB in 0.384 seconds 5.21 MB/s 18.75% size
bz2 level=9 2331kB->437kB in 0.394 seconds 5.08 MB/s 18.77% size

morotti pushed a commit to man-group/cpython that referenced this issue Mar 19, 2025
…n gzip and tarfile

It is the default level used by most compression tools and a
better tradeoff between speed and performance.
@gpshead
Copy link
Member Author

gpshead commented Mar 19, 2025

i suggest not overthinking exactly 1 means for now, we can tune that if it causes problems during the beta period.

@zooba
Copy link
Member

zooba commented Mar 19, 2025

@morotti Thanks for those! Yeah, moving to 6 looks about right based on that, though it's broader than the implementation in the Windows distribution, so I'll look for others to chime in on it.

Leaving this issue open for now until @ned-deily decides on the macOS build and we answer the default level question.

@morotti
Copy link
Contributor

morotti commented Mar 20, 2025

quick benchmark on the compression on numpy.json (see script few comments above)
this is not meant to be an exhaustive benchmark.

windows laptop, turbo boost off
main branch yesterday before the PR to use zlib-ng VS main branch today with zlib-ng

quick takeaways:
level 1 is nearly three times faster but it's compressing a lot worse, as expected. It's great, this will allow a lot of legacy apps that were stuck on zip/tar.gz and bottleneck by the slow compression/decompression speed to run more effectively.
level 2 is nearly twice as fast and it's compressing slightly better than ALL the zlib levels up to 8. It's awesome.

algorithm library level size_decompressed size_compressed duration_s speed_mb_s ratio
gzip-0 gzip 0 2331 2331 0.005 422.09 1.00
gzip-1 gzip 1 2331 523 0.036 55.91 0.22
gzip-2 gzip 2 2331 513 0.036 54.98 0.22
gzip-3 gzip 3 2331 508 0.038 52.1 0.22
gzip-4 gzip 4 2331 477 0.066 30.44 0.21
gzip-5 gzip 5 2331 468 0.074 26.99 0.20
gzip-6 gzip 6 2331 459 0.088 22.83 0.20
gzip-7 gzip 7 2331 456 0.096 20.85 0.20
gzip-8 gzip 8 2331 453 0.138 14.5 0.19
gzip-9 gzip 9 2331 453 0.147 13.58 0.19
gzip(zlib-ng)-0 gzip(zlib-ng) 0 2331 2331 0.004 526.03 1.00
gzip(zlib-ng)-1 gzip(zlib-ng) 1 2331 687 0.014 144.37 0.29
gzip(zlib-ng)-2 gzip(zlib-ng) 2 2331 471 0.022 92.42 0.20
gzip(zlib-ng)-3 gzip(zlib-ng) 3 2331 456 0.034 59.18 0.20
gzip(zlib-ng)-4 gzip(zlib-ng) 4 2331 448 0.037 54.19 0.19
gzip(zlib-ng)-5 gzip(zlib-ng) 5 2331 442 0.041 48.19 0.19
gzip(zlib-ng)-6 gzip(zlib-ng) 6 2331 437 0.049 40.42 0.19
gzip(zlib-ng)-7 gzip(zlib-ng) 7 2331 432 0.049 41.14 0.19
gzip(zlib-ng)-8 gzip(zlib-ng) 8 2331 428 0.063 31.65 0.18
gzip(zlib-ng)-9 gzip(zlib-ng) 9 2331 476 0.079 25.4 0.20

I think we can all agree that zlib is obsolete. can we start using zlib-ng on the linux builds too?
please please please

@nmoinvaz
Copy link

FYI, I did some testing many years ago and determined that 128k sized buffers were most optimal for performance. zlib-ng/zlib-ng#868

@morotti
Copy link
Contributor

morotti commented Mar 27, 2025

hello,

zlib-ng started being used for the windows build a week ago and it's awesome :)
any plan to start using it for the linux build soon?

@gpshead
Copy link
Member Author

gpshead commented Mar 27, 2025

Contributions to the linux build as in configure.ac & Makefile.pre.in are welcome so that when a recent enough version of zlib-ng is found, it is preferred over vanilla zlib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build OS-mac OS-windows performance Performance or resource usage type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

10 participants