Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Nov 2, 2025

Background

This PR is part of an EPIC to push down hash table references from HashJoinExec into scans. The EPIC is tracked in #17171.

A "target state" is tracked in #18393.
There is a series of PRs to get us to this target state in smaller more reviewable changes that are still valuable on their own:

Changes in this PR

  • Enhance InListExpr to efficiently store homogeneous lists as arrays and avoid a conversion to Vec
    by adding an internal InListStorage enum with Array and Exprs variants
  • Re-use existing hashing and comparison utilities to support Struct arrays and other complex types
  • Add public function in_list_from_array(expr, list_array, negated) for creating InList from arrays

Although the diff looks large most of it is actually tests and docs. I think the actual code change is a negative LOC change, or at least negative complexity (eliminates a trait, a macro, matching on data types).

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) common Related to common crate proto Related to proto crate physical-plan Changes to the physical-plan crate labels Nov 2, 2025
Comment on lines +318 to +320
// TODO: serialize the inner ArrayRef directly to avoid materialization into literals
// by extending the protobuf definition to support both representations and adding a public
// accessor method to InListExpr to get the inner ArrayRef
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a followup issue once we merge this

05)--------ProjectionExec: expr=[]
06)----------CoalesceBatchesExec: target_batch_size=8192
07)------------FilterExec: substr(md5(CAST(value@0 AS Utf8View)), 1, 32) IN ([7f4b18de3cfeb9b4ac78c381ee2ad278, a, b, c])
07)------------FilterExec: substr(md5(CAST(value@0 AS Utf8View)), 1, 32) IN (SET) ([7f4b18de3cfeb9b4ac78c381ee2ad278, a, b, c])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we now support Utf8View for building the sets 😄

Comment on lines +565 to +574
let random_state = RandomState::with_seed(0);
let mut hashes_buf = vec![0u64; array.len()];
let Ok(hashes_buf) = create_hashes_from_arrays(
&[array.as_ref()],
&random_state,
&mut hashes_buf,
) else {
unreachable!("Failed to create hashes for InList array. This shouldn't happen because make_set should have succeeded earlier.");
};
hashes_buf.hash(state);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could pre-compute and store a hash: u64 which would be both more performant when Hash is called and avoid this error, but it would add more complexity and some overhead when building the InListExpr

Change create_hashes and related functions to work with &dyn Array
references instead of requiring ArrayRef (Arc-wrapped arrays).
This avoids unnecessary Arc::clone() calls and enables calls that
only have an &dyn Array to use the hashing utilities.

Changes:
- Add create_hashes_from_arrays(&[&dyn Array]) function
- Refactor hash_dictionary, hash_list_array, hash_fixed_list_array
  to use references instead of cloning
- Extract hash_single_array() helper for common logic
Changes:
- Enhance InListExpr to efficiently store homogeneous lists as arrays and avoid a conversion to Vec<PhysicalExpr>
    by adding an internal InListStorage enum with Array and Exprs variants
- Re-use existing hashing and comparison utilities to support Struct arrays and other complex types
- Add public function `in_list_from_array(expr, list_array, negated)` for creating InList from arrays
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant