[BufferPool] BufferPool API design and necessary scheduler / runtime adaptions by e-strauss · Pull Request #37 · deem-data/stratum

e-strauss · 2026-04-10T15:00:53Z

This PR updates how stratum handles intermediates. Previously, we stored all intermediates directly in the Ops and each Op was directly accesses these fields on their inputs operators. These changes decouple the intermediate storage from the Op execution, by storing the intermediates in the BufferPool. The BufferPool API consists mainly of three methods:

PUT(Op, Intermediate) -> None
GET(Op) -> Intermediate
RELEASE(Op) -> None

In the optimization step, after linerization of the DAG, we do analysis pass and find out at which point we can release a Op's intermediate by looking at all consumers of an Op and especially if an Op's output is consumed multiple times. E.g. the inputs of the split operation is needed for all CV folds.

Right now, the buffer pool is merely a simple HashMap of Ops and intermediates. Next steps, are to add memory thresholds and intermediate sizes for building eviction logic and serialization to avoid running out of memory.

Benchmark

The lower memory consumption because of releasing of unnecessary intermediates can be seen in the following plots of the reserved physical memory of the stratum process on this benchmark use case:

def dummy_func(x, t: float=0.1):
    indices = np.arange(len(x))
    out = x.iloc[indices]
    sleep(t)
    return out
    
def main():
     df = skrub.as_data_op(f"input_{n}.csv").skb.apply_func(pd.read_csv)
     X = df.drop("y", axis=1).skb.mark_as_X()
    y = df["y"].skb.mark_as_y()
    for i in range(5):   # 5 times copy the input frame to simulate some processing
        X = X.skb.apply_func(dummy_func, t=0.3)
    model = DummyRegressor()
    pred = X.skb.apply(model, y=y)
    pred.skb.make_grid_search()

With the input release planning, we are able to immediately free intermediate of our custom UDF once it is processed and keep the required memory low.

Without planning

With release planning

Most important changes:

Introduce DAG linearization and input release planning in the optimizer pipeline
Add BufferManager and BufferPool with static consumer counting, pinning, and fold-aware lifecycle management
Refactor Op.process() to return computed results instead of mutating internal state; extract argument resolution helpers
Update scheduler to plan buffer lifecycles ahead of execution and flush stale buffers across folds
Adjust scheduler and buffer pool to support pinned op planning data and deterministic intermediate release
Add runtime components for buffer lifecycle tracking and handle-based storage
Update API and tests to work with linearized execution plans and new release semantics
Include comprehensive test suite for buffer pool (registration, retrieval, pinning, and split-phase scenarios) and benchmark for intermediate release

codecov · 2026-04-10T15:04:34Z

Codecov Report

❌ Patch coverage is 97.08029% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
stratum/runtime/_scheduler.py	92.50%	3 Missing ⚠️
stratum/optimizer/_linearization.py	93.54%	1 Missing and 1 partial ⚠️
stratum/optimizer/ir/_dataframe_ops.py	96.77%	2 Missing ⚠️
stratum/optimizer/_op_utils.py	85.71%	1 Missing ⚠️

Flag	Coverage Δ
unittests	`89.50% <97.08%> (+0.74%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
stratum/_api.py	`93.10% <100.00%> (ø)`
stratum/optimizer/_input_release_planning.py	`100.00% <100.00%> (ø)`
stratum/optimizer/_optimize.py	`92.85% <100.00%> (+1.32%)`	⬆️
stratum/optimizer/ir/_numeric_ops.py	`100.00% <100.00%> (ø)`
stratum/optimizer/ir/_ops.py	`98.02% <100.00%> (+0.04%)`	⬆️
stratum/runtime/_buffer_pool.py	`100.00% <100.00%> (ø)`
stratum/optimizer/_op_utils.py	`86.02% <85.71%> (+0.31%)`	⬆️
stratum/optimizer/_linearization.py	`93.54% <93.54%> (ø)`
stratum/optimizer/ir/_dataframe_ops.py	`94.52% <96.77%> (-0.09%)`	⬇️
stratum/runtime/_scheduler.py	`87.58% <92.50%> (+3.37%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…adaptions - Introduce DAG linearization and input release planning in the optimizer pipeline - Add BufferManager and BufferPool with static consumer counting, pinning, and fold-aware lifecycle management - Refactor Op.process() to return computed results instead of mutating internal state; extract argument resolution helpers - Update scheduler to plan buffer lifecycles ahead of execution and flush stale buffers across folds - Adjust scheduler and buffer pool to support pinned op planning data and deterministic intermediate release - Add runtime components for buffer lifecycle tracking and handle-based storage - Update API and tests to work with linearized execution plans and new release semantics - Include comprehensive test suite for buffer pool (registration, retrieval, pinning, and split-phase scenarios) and benchmark for intermediate release

e-strauss force-pushed the buffer-pool-squashed branch from 50b4d66 to 8d217b9 Compare April 16, 2026 12:25

e-strauss force-pushed the buffer-pool-squashed branch from 8d217b9 to e2ec0c0 Compare April 16, 2026 12:35

e-strauss closed this in d0f7af4 Apr 23, 2026

e-strauss deleted the buffer-pool-squashed branch April 29, 2026 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BufferPool] BufferPool API design and necessary scheduler / runtime adaptions#37

[BufferPool] BufferPool API design and necessary scheduler / runtime adaptions#37
e-strauss wants to merge 1 commit into
deem-data:mainfrom
e-strauss:buffer-pool-squashed

e-strauss commented Apr 10, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

e-strauss commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Without planning

With release planning

Most important changes:

Uh oh!

codecov Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

e-strauss commented Apr 10, 2026 •

edited

Loading

codecov Bot commented Apr 10, 2026 •

edited

Loading