-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48187: [C++] Cache ZSTD compression/decompression context #48192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
729ee5e to
859bf82
Compare
felipecrv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| int64_t output_buffer_len, uint8_t* output_buffer) override { | ||
| size_t ret = ZSTD_compress(output_buffer, static_cast<size_t>(output_buffer_len), | ||
| input, static_cast<size_t>(input_len), compression_level_); | ||
| if (!compression_context_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's parallelization in the IPC API that I didn't account for, the same instance of ZSTDCodec is used by multiple threads concurrently. Revising and checking if thread_local can be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread_local is not appropriate, because there's a conflict on compression level.
Need to think carefully about this. It's the same problematic implementation detail for all the codecs - they are all using the "singleshot" APIs for stuff that is done at high frequency with usually tiny buffers, and at least within IPC, they are also all used in a concurrent context with the same codec...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears the issue is common to Brotli and ZSTD - both of them are requiring more space than can fit on the stack. So those two would benefit of a thread-local context, as their context is always heap allocated.
The others, LZ4, Snappy, BZIP etc. are all properly optimized in the sense that they can fit the context entirely onto the stack. For them, the single-shot interfaces work as expected and carry no (significant) overhead. Merely lz4 still has a fast path for context-reuse but it's in the static-only part of the interface.
Rationale for this change
Avoid costly reallocations of ZSTD context when reusing
ZSTDCodecinstances.What changes are included in this PR?
Replace calls to
ZSTD_compress/ZSTD_decompresswhich are allocating the ZSTD context internally with corresponding APIs with explicit context management.Are these changes tested?
Yes.
Are there any user-facing changes?
No.