Skip to content

HEADROOM_FACTOR 6.0 -> 5.0: surface nested_loop_join_spill leak#424

Closed
avantgardnerio wants to merge 1 commit into
brent/memory-accountingfrom
brent/memory-accounting-5x
Closed

HEADROOM_FACTOR 6.0 -> 5.0: surface nested_loop_join_spill leak#424
avantgardnerio wants to merge 1 commit into
brent/memory-accountingfrom
brent/memory-accounting-5x

Conversation

@avantgardnerio
Copy link
Copy Markdown

Summary

One-line drop of HEADROOM_FACTOR from 6.0 to 5.0 on top of brent/memory-accounting.

Expected outcome: CI red on nested_loop_join_spill.slt. That's the point — the operator's untracked allocation overflows a 5× headroom envelope around its declared 150K memory limit.

Background

The parent branch caps HEADROOM_FACTOR at 6.0 with the comment "600% high, but that's what it takes to pass the SLT suite right now. Goal should be ~10%". This PR is the next click down the dial. It demonstrates that the framework is doing what it claims: actual allocations diverge wildly from MemoryPool::reserved().

What fires

nested_loop_join_spill.slt's first query (100K-row INNER JOIN with non-equijoin predicate under SET datafusion.runtime.memory_limit = '150K') panics with OverdraftPanic. Cumulative bank debit reaches ~3.7 MB before failing. The leak comes from at least three operator sites:

  1. LazyMemoryStream::generate_next_batchVec::with_capacity for the generate_series output buffer.
  2. concat_batches inside NestedLoopJoinStream::handle_buffering_left_memory_limited — fresh primitive buffers via concat_primitives.
  3. take_native inside process_left_range_joinScalarBuffer::from_iter collecting take results.

Plus arrow_ipc::reader::MessageReader::maybe_next from SpillReaderStream::poll_next_inner once the operator gets into the spill read path.

None go through MemoryReservation::try_grow.

Test plan

  • CI fails on nested_loop_join_spill.slt with allocator overdraft: account balance at panic = ....
  • All other SLTs pass (no collateral regressions from the 1× difference).
  • Local repro: cargo test --features memory-accounting --profile ci -p datafusion-sqllogictest --test sqllogictests -- nested_loop_join_spill --default-pool-size-mb 16384 exits non-zero.

CI will fail on nested_loop_join_spill.slt. That's the point — at 5x
headroom, the first query allocates ~3.7 MB against a declared 150K
limit. NestedLoopJoinExec has untracked allocation in three distinct
sites (generate_next_batch buffering, concat_batches at the spill
boundary, and take_native during probe), plus the IPC reader path on
spill re-read. Fix the operator; don't relax the const.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avantgardnerio
Copy link
Copy Markdown
Author

Superseded by apache#22721 — rebased onto upstream/main now that the tracker framework (apache#22626) is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant