Skip to content

Lower SLT HEADROOM_FACTOR 8.0 -> 5.0 to surface nested_loop_join_spill leak#22721

Open
avantgardnerio wants to merge 1 commit into
apache:mainfrom
coralogix:brent/headroom-5x
Open

Lower SLT HEADROOM_FACTOR 8.0 -> 5.0 to surface nested_loop_join_spill leak#22721
avantgardnerio wants to merge 1 commit into
apache:mainfrom
coralogix:brent/headroom-5x

Conversation

@avantgardnerio
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Follow-up to #22626 (allocator-level memory tracking in SLTs).

Rationale for this change

HEADROOM_FACTOR in the SLT accounting pool is currently 8.0 — i.e. we accept allocator usage up to 8× the pool's declared limit before failing. That's loose enough to hide real leaks. Lowering it to 5.0 surfaces an untracked allocation in nested_loop_join_spill.slt.

At 5× headroom, the first query in that suite allocates ~3.7 MB against a declared 150K limit. NestedLoopJoinExec has untracked allocation in at least three sites — generate_next_batch buffering, concat_batches at the spill boundary, and take_native during probe — plus the IPC reader path on spill re-read.

What changes are included in this PR?

One constant change in datafusion/sqllogictest/src/accounting_pool.rs: HEADROOM_FACTOR: f64 = 8.05.0, with updated doc comment.

Are these changes tested?

CI is the test — nested_loop_join_spill.slt is expected to fail until the operator-side leaks are fixed. That's the intended signal of this PR, not a regression.

Are there any user-facing changes?

No. Test-infrastructure-only.

CI will fail on nested_loop_join_spill.slt. That's the point — at 5x
headroom, the first query allocates ~3.7 MB against a declared 150K
limit. NestedLoopJoinExec has untracked allocation in three distinct
sites (generate_next_batch buffering, concat_batches at the spill
boundary, and take_native during probe), plus the IPC reader path on
spill re-read. Fix the operator; don't relax the const.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avantgardnerio
Copy link
Copy Markdown
Contributor Author

CI surfaced exactly the failure this PR is meant to expose: nested_loop_join_spill.slt:33, allocator overdraft of -20245 bytes (peak ~770KB against a 150KB pool × 5× headroom = 750KB budget).

Filed #22723 to track the underlying NestedLoopJoinExec spill-path untracked allocation. This PR stays open as the test-side witness — once #22723 lands the SLT goes green here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant