Skip to content

bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation#22704

Open
adriangb wants to merge 1 commit into
apache:mainfrom
pydantic:predicate-eval-benchmarks
Open

bench: add predicate_eval SQL micro-benchmark suite for conjunctive filter evaluation#22704
adriangb wants to merge 1 commit into
apache:mainfrom
pydantic:predicate-eval-benchmarks

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented Jun 2, 2026

Which issue does this PR close?

This PR does not close an issue. It adds a benchmark suite to support work and
discussion around predicate ordering in filter evaluation (e.g. the static
reordering in #22343 and the runtime/statistics-based reordering explored in
#22698). It deliberately benchmarks no specific implementation — see below.

Rationale for this change

Conjunctive (AND) filter evaluation in FilterExec is a left-deep
BinaryExpr(And) chain, and the order conjuncts are evaluated in can change
runtime by large factors: once a leading conjunct passes few enough rows the
batch is physically compacted before the rest, so a cheap-and-selective
predicate evaluated early saves later predicates work. Predicate ordering is
therefore an active area (static heuristics, runtime/adaptive schemes, cost
models).

There is currently no benchmark suite that isolates the dimensions that drive
this. Existing macro-benchmarks (TPC-H/DS, ClickBench) only incidentally
exercise filter ordering, so they can't show why a change to ordering helped
or hurt, or guard the order-insensitive case against regressions.

What changes are included in this PR?

A new SQL benchmark suite, benchmarks/sql_benchmarks/predicate_eval, built on
the existing .benchmark template framework (no engine code, no new Rust). It
sets no engine config of its own and measures DataFusion's built-in short-circuit
by default; a system under test is toggled purely via its native
DATAFUSION_EXECUTION_* env var (the bench harness builds its SessionContext
with SessionConfig::from_env), so the same scenarios can characterise the
baseline, a static heuristic, an adaptive scheme, or a cost model and be
compared apples-to-apples.

It is organised into 10 subgroups (select with BENCH_SUBGROUP), each varying
one property of conjunctive filter evaluation while holding the others fixed:

Subgroup What it varies (others held fixed)
costsel cost and selectivity point in different directions (expensive predicate is the selective one)
cost per-predicate cost, at equal selectivity
selectivity per-predicate selectivity, at equal cost
cardinality conjunct count k = 2/4/8/16
width string-column width (PRED_FILL = 2 / 30 / 170 chars)
scale row count 5k / 100k / 5M / 50M
neutral predicates are interchangeable (equal cost, none selective) — an order-insensitive control
correlation conditional vs marginal selectivity (independent / positively / anti-correlated)
drift selectivity that changes across the scan
nulls null density (two- vs three-valued predicate results)

Each query's comment notes the per-predicate cost/selectivity that the data
generation hides from the SQL. Data is synthetic and generated inline by each
subgroup's load SQL (no external files); PRED_ROWS sizes it and PRED_FILL
sets string width. Wired into bench.sh (./bench.sh run predicate_eval) and
documented in benchmarks/sql_benchmarks/README.md.

The design was informed by surveying how Velox drives the analogous decision
(it ranks by cycles-per-row-eliminated, time / (rows_in - rows_out)).

Note: the scale subgroup's q52/q53 build 5M / 50M-row tables (the latter
~9 GB); run a single point with BENCH_QUERY if that is too heavy.

Are these changes tested?

These are benchmark definitions, not engine code. Each .benchmark includes an
assert that the generated table is non-empty, and every subgroup was run
locally at small PRED_ROWS to confirm the suite parses, loads, asserts, and
executes end-to-end. The queries are order-invariant (SELECT count(*) ...), so
any predicate-ordering system can also be checked for correctness by diffing
counts with the optimization on vs. off.

Are there any user-facing changes?

No. This only adds an opt-in benchmark suite and its documentation; no public
API, engine behavior, or default configuration changes.

@adriangb adriangb force-pushed the predicate-eval-benchmarks branch from 5ae3acf to a09a05a Compare June 2, 2026 04:33
…ilter evaluation

Add `benchmarks/sql_benchmarks/predicate_eval`, an implementation-agnostic
SQL benchmark suite that isolates the cost axes driving conjunctive (AND)
filter evaluation in FilterExec's left-deep short-circuit path: cost-weighted
ordering (cost/(1-sel)), per-predicate cost, selectivity, conjunct count,
string-column width, row count, predicate correlation, selectivity drift,
nulls, and an order-neutral overhead/regression guard.

The suite sets no engine config of its own. It measures DataFusion's built-in
short-circuit by default; a predicate-ordering system under test is toggled via
its native DATAFUSION_* env var (the bench harness builds its SessionContext
with SessionConfig::from_env). Synthetic data is generated inline by each
subgroup's load SQL and sized with PRED_ROWS / PRED_FILL.

Built on the existing .benchmark template framework (no engine code). Wired
into bench.sh (`./bench.sh run predicate_eval`) and documented in the
sql_benchmarks README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adriangb adriangb force-pushed the predicate-eval-benchmarks branch from a09a05a to 22c632f Compare June 2, 2026 04:41
@adriangb adriangb marked this pull request as ready for review June 2, 2026 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant