[BufferPool] BufferPool API design and necessary scheduler / runtime adaptions#37
Closed
e-strauss wants to merge 1 commit into
Closed
[BufferPool] BufferPool API design and necessary scheduler / runtime adaptions#37e-strauss wants to merge 1 commit into
e-strauss wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
50b4d66 to
8d217b9
Compare
…adaptions - Introduce DAG linearization and input release planning in the optimizer pipeline - Add BufferManager and BufferPool with static consumer counting, pinning, and fold-aware lifecycle management - Refactor Op.process() to return computed results instead of mutating internal state; extract argument resolution helpers - Update scheduler to plan buffer lifecycles ahead of execution and flush stale buffers across folds - Adjust scheduler and buffer pool to support pinned op planning data and deterministic intermediate release - Add runtime components for buffer lifecycle tracking and handle-based storage - Update API and tests to work with linearized execution plans and new release semantics - Include comprehensive test suite for buffer pool (registration, retrieval, pinning, and split-phase scenarios) and benchmark for intermediate release
8d217b9 to
e2ec0c0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR updates how stratum handles intermediates. Previously, we stored all intermediates directly in the Ops and each Op was directly accesses these fields on their inputs operators. These changes decouple the intermediate storage from the Op execution, by storing the intermediates in the BufferPool. The BufferPool API consists mainly of three methods:
In the optimization step, after linerization of the DAG, we do analysis pass and find out at which point we can release a Op's intermediate by looking at all consumers of an Op and especially if an Op's output is consumed multiple times. E.g. the inputs of the split operation is needed for all CV folds.
Right now, the buffer pool is merely a simple HashMap of Ops and intermediates. Next steps, are to add memory thresholds and intermediate sizes for building eviction logic and serialization to avoid running out of memory.
Benchmark
The lower memory consumption because of releasing of unnecessary intermediates can be seen in the following plots of the reserved physical memory of the stratum process on this benchmark use case:
With the input release planning, we are able to immediately free intermediate of our custom UDF once it is processed and keep the required memory low.
Without planning
With release planning
Most important changes: