bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation by adriangb · Pull Request #22704 · apache/datafusion

adriangb · 2026-06-02T04:21:10Z

Which issue does this PR close?

This PR does not close an issue. It adds a benchmark suite to support work and
discussion around predicate ordering in filter evaluation (e.g. the static
reordering in #22343 and the runtime/statistics-based reordering explored in
#22698). It deliberately benchmarks no specific implementation — see below.

Rationale for this change

Conjunctive (AND) filter evaluation in FilterExec is a left-deep
BinaryExpr(And) chain, and the order conjuncts are evaluated in can change
runtime by large factors: once a leading conjunct passes few enough rows the
batch is physically compacted before the rest, so a cheap-and-selective
predicate evaluated early saves later predicates work. Predicate ordering is
therefore an active area (static heuristics, runtime/adaptive schemes, cost
models).

There is currently no benchmark suite that isolates the dimensions that drive
this. Existing macro-benchmarks (TPC-H/DS, ClickBench) only incidentally
exercise filter ordering, so they can't show why a change to ordering helped
or hurt, or guard the order-insensitive case against regressions.

What changes are included in this PR?

A new SQL benchmark suite, benchmarks/sql_benchmarks/predicate_eval, built on
the existing .benchmark template framework (no engine code, no new Rust). It
sets no engine config of its own and measures DataFusion's built-in short-circuit
by default; a system under test is toggled purely via its native
DATAFUSION_EXECUTION_* env var (the bench harness builds its SessionContext
with SessionConfig::from_env), so the same scenarios can characterise the
baseline, a static heuristic, an adaptive scheme, or a cost model and be
compared apples-to-apples.

It is organised into 10 subgroups (select with BENCH_SUBGROUP), each varying
one property of conjunctive filter evaluation while holding the others fixed:

Subgroup	What it varies (others held fixed)
`costsel`	cost and selectivity point in different directions (expensive predicate is the selective one)
`cost`	per-predicate cost, at equal selectivity
`selectivity`	per-predicate selectivity, at equal cost
`cardinality`	conjunct count `k = 2/4/8/16`
`width`	string-column width (`PRED_FILL` = 2 / 30 / 170 chars)
`scale`	row count `5k / 100k / 5M / 50M`
`neutral`	predicates are interchangeable (equal cost, none selective) — an order-insensitive control
`correlation`	conditional vs marginal selectivity (independent / positively / anti-correlated)
`drift`	selectivity that changes across the scan
`nulls`	null density (two- vs three-valued predicate results)

Each query's comment notes the per-predicate cost/selectivity that the data
generation hides from the SQL. Data is synthetic and generated inline by each
subgroup's load SQL (no external files); PRED_ROWS sizes it and PRED_FILL
sets string width. Wired into bench.sh (./bench.sh run predicate_eval) and
documented in benchmarks/sql_benchmarks/README.md.

The design was informed by surveying how Velox drives the analogous decision
(it ranks by cycles-per-row-eliminated, time / (rows_in - rows_out)).

Note: the scale subgroup's q52/q53 build 5M / 50M-row tables (the latter
~9 GB); run a single point with BENCH_QUERY if that is too heavy.

Are these changes tested?

These are benchmark definitions, not engine code. Each .benchmark includes an
assert that the generated table is non-empty, and every subgroup was run
locally at small PRED_ROWS to confirm the suite parses, loads, asserts, and
executes end-to-end. The queries are order-invariant (SELECT count(*) ...), so
any predicate-ordering system can also be checked for correctness by diffing
counts with the optimization on vs. off.

Are there any user-facing changes?

No. This only adds an opt-in benchmark suite and its documentation; no public
API, engine behavior, or default configuration changes.

…ilter evaluation Add `benchmarks/sql_benchmarks/predicate_eval`, an implementation-agnostic SQL benchmark suite that isolates the cost axes driving conjunctive (AND) filter evaluation in FilterExec's left-deep short-circuit path: cost-weighted ordering (cost/(1-sel)), per-predicate cost, selectivity, conjunct count, string-column width, row count, predicate correlation, selectivity drift, nulls, and an order-neutral overhead/regression guard. The suite sets no engine config of its own. It measures DataFusion's built-in short-circuit by default; a predicate-ordering system under test is toggled via its native DATAFUSION_* env var (the bench harness builds its SessionContext with SessionConfig::from_env). Synthetic data is generated inline by each subgroup's load SQL and sized with PRED_ROWS / PRED_FILL. Built on the existing .benchmark template framework (no engine code). Wired into bench.sh (`./bench.sh run predicate_eval`) and documented in the sql_benchmarks README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

adriangb force-pushed the predicate-eval-benchmarks branch from 5ae3acf to a09a05a Compare June 2, 2026 04:33

adriangb force-pushed the predicate-eval-benchmarks branch from a09a05a to 22c632f Compare June 2, 2026 04:41

adriangb marked this pull request as ready for review June 2, 2026 04:46

adriangb mentioned this pull request Jun 2, 2026

perf: Reorder predicates in conjuncts via simple heuristic #22343

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation#22704

bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation#22704
adriangb wants to merge 1 commit into
apache:mainfrom
pydantic:predicate-eval-benchmarks

adriangb commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adriangb commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adriangb commented Jun 2, 2026 •

edited

Loading