bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation#22704
Open
adriangb wants to merge 1 commit into
Open
bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation#22704adriangb wants to merge 1 commit into
adriangb wants to merge 1 commit into
Conversation
5ae3acf to
a09a05a
Compare
…ilter evaluation Add `benchmarks/sql_benchmarks/predicate_eval`, an implementation-agnostic SQL benchmark suite that isolates the cost axes driving conjunctive (AND) filter evaluation in FilterExec's left-deep short-circuit path: cost-weighted ordering (cost/(1-sel)), per-predicate cost, selectivity, conjunct count, string-column width, row count, predicate correlation, selectivity drift, nulls, and an order-neutral overhead/regression guard. The suite sets no engine config of its own. It measures DataFusion's built-in short-circuit by default; a predicate-ordering system under test is toggled via its native DATAFUSION_* env var (the bench harness builds its SessionContext with SessionConfig::from_env). Synthetic data is generated inline by each subgroup's load SQL and sized with PRED_ROWS / PRED_FILL. Built on the existing .benchmark template framework (no engine code). Wired into bench.sh (`./bench.sh run predicate_eval`) and documented in the sql_benchmarks README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a09a05a to
22c632f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
This PR does not close an issue. It adds a benchmark suite to support work and
discussion around predicate ordering in filter evaluation (e.g. the static
reordering in #22343 and the runtime/statistics-based reordering explored in
#22698). It deliberately benchmarks no specific implementation — see below.
Rationale for this change
Conjunctive (
AND) filter evaluation inFilterExecis a left-deepBinaryExpr(And)chain, and the order conjuncts are evaluated in can changeruntime by large factors: once a leading conjunct passes few enough rows the
batch is physically compacted before the rest, so a cheap-and-selective
predicate evaluated early saves later predicates work. Predicate ordering is
therefore an active area (static heuristics, runtime/adaptive schemes, cost
models).
There is currently no benchmark suite that isolates the dimensions that drive
this. Existing macro-benchmarks (TPC-H/DS, ClickBench) only incidentally
exercise filter ordering, so they can't show why a change to ordering helped
or hurt, or guard the order-insensitive case against regressions.
What changes are included in this PR?
A new SQL benchmark suite,
benchmarks/sql_benchmarks/predicate_eval, built onthe existing
.benchmarktemplate framework (no engine code, no new Rust). Itsets no engine config of its own and measures DataFusion's built-in short-circuit
by default; a system under test is toggled purely via its native
DATAFUSION_EXECUTION_*env var (the bench harness builds itsSessionContextwith
SessionConfig::from_env), so the same scenarios can characterise thebaseline, a static heuristic, an adaptive scheme, or a cost model and be
compared apples-to-apples.
It is organised into 10 subgroups (select with
BENCH_SUBGROUP), each varyingone property of conjunctive filter evaluation while holding the others fixed:
costselcostselectivitycardinalityk = 2/4/8/16widthPRED_FILL= 2 / 30 / 170 chars)scale5k / 100k / 5M / 50MneutralcorrelationdriftnullsEach query's comment notes the per-predicate cost/selectivity that the data
generation hides from the SQL. Data is synthetic and generated inline by each
subgroup's load SQL (no external files);
PRED_ROWSsizes it andPRED_FILLsets string width. Wired into
bench.sh(./bench.sh run predicate_eval) anddocumented in
benchmarks/sql_benchmarks/README.md.The design was informed by surveying how Velox drives the analogous decision
(it ranks by cycles-per-row-eliminated,
time / (rows_in - rows_out)).Are these changes tested?
These are benchmark definitions, not engine code. Each
.benchmarkincludes anassertthat the generated table is non-empty, and every subgroup was runlocally at small
PRED_ROWSto confirm the suite parses, loads, asserts, andexecutes end-to-end. The queries are order-invariant (
SELECT count(*) ...), soany predicate-ordering system can also be checked for correctness by diffing
counts with the optimization on vs. off.
Are there any user-facing changes?
No. This only adds an opt-in benchmark suite and its documentation; no public
API, engine behavior, or default configuration changes.