HEADROOM_FACTOR 6.0 -> 5.0: surface nested_loop_join_spill leak#424
Closed
avantgardnerio wants to merge 1 commit into
Closed
HEADROOM_FACTOR 6.0 -> 5.0: surface nested_loop_join_spill leak#424avantgardnerio wants to merge 1 commit into
avantgardnerio wants to merge 1 commit into
Conversation
CI will fail on nested_loop_join_spill.slt. That's the point — at 5x headroom, the first query allocates ~3.7 MB against a declared 150K limit. NestedLoopJoinExec has untracked allocation in three distinct sites (generate_next_batch buffering, concat_batches at the spill boundary, and take_native during probe), plus the IPC reader path on spill re-read. Fix the operator; don't relax the const. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
|
Superseded by apache#22721 — rebased onto upstream/main now that the tracker framework (apache#22626) is merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
One-line drop of
HEADROOM_FACTORfrom 6.0 to 5.0 on top of brent/memory-accounting.Expected outcome: CI red on
nested_loop_join_spill.slt. That's the point — the operator's untracked allocation overflows a 5× headroom envelope around its declared 150K memory limit.Background
The parent branch caps
HEADROOM_FACTORat 6.0 with the comment "600% high, but that's what it takes to pass the SLT suite right now. Goal should be ~10%". This PR is the next click down the dial. It demonstrates that the framework is doing what it claims: actual allocations diverge wildly fromMemoryPool::reserved().What fires
nested_loop_join_spill.slt's first query (100K-row INNER JOIN with non-equijoin predicate underSET datafusion.runtime.memory_limit = '150K') panics withOverdraftPanic. Cumulative bank debit reaches ~3.7 MB before failing. The leak comes from at least three operator sites:LazyMemoryStream::generate_next_batch—Vec::with_capacityfor the generate_series output buffer.concat_batchesinsideNestedLoopJoinStream::handle_buffering_left_memory_limited— fresh primitive buffers viaconcat_primitives.take_nativeinsideprocess_left_range_join—ScalarBuffer::from_itercollecting take results.Plus
arrow_ipc::reader::MessageReader::maybe_nextfromSpillReaderStream::poll_next_inneronce the operator gets into the spill read path.None go through
MemoryReservation::try_grow.Test plan
nested_loop_join_spill.sltwithallocator overdraft: account balance at panic = ....cargo test --features memory-accounting --profile ci -p datafusion-sqllogictest --test sqllogictests -- nested_loop_join_spill --default-pool-size-mb 16384exits non-zero.