Skip to content

Conversation

@Ext3h
Copy link

@Ext3h Ext3h commented Nov 20, 2025

Rationale for this change

Avoid costly reallocations of ZSTD context when reusing ZSTDCodec instances.

What changes are included in this PR?

Replace calls to ZSTD_compress / ZSTD_decompress which are allocating the ZSTD context internally with corresponding APIs with explicit context management.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #48187 has been automatically assigned in GitHub to PR creator.

@Ext3h Ext3h force-pushed the zstd_context_reuse branch from 729ee5e to 859bf82 Compare November 20, 2025 15:25
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thisisnic thisisnic changed the title GH-48187: Cache ZSTD compression/decompression context GH-48187: [C++] Cache ZSTD compression/decompression context Nov 20, 2025
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Nov 20, 2025
int64_t output_buffer_len, uint8_t* output_buffer) override {
size_t ret = ZSTD_compress(output_buffer, static_cast<size_t>(output_buffer_len),
input, static_cast<size_t>(input_len), compression_level_);
if (!compression_context_) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's parallelization in the IPC API that I didn't account for, the same instance of ZSTDCodec is used by multiple threads concurrently. Revising and checking if thread_local can be appropriate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread_local is not appropriate, because there's a conflict on compression level.

Need to think carefully about this. It's the same problematic implementation detail for all the codecs - they are all using the "singleshot" APIs for stuff that is done at high frequency with usually tiny buffers, and at least within IPC, they are also all used in a concurrent context with the same codec...

Copy link
Author

@Ext3h Ext3h Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the issue is common to Brotli and ZSTD - both of them are requiring more space than can fit on the stack. So those two would benefit of a thread-local context, as their context is always heap allocated.

The others, LZ4, Snappy, BZIP etc. are all properly optimized in the sense that they can fit the context entirely onto the stack. For them, the single-shot interfaces work as expected and carry no (significant) overhead. Merely lz4 still has a fast path for context-reuse but it's in the static-only part of the interface.

@Ext3h Ext3h marked this pull request as draft November 20, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants