Skip to content

fix: correct cross join byte size statistics#22700

Merged
Dandandan merged 1 commit into
apache:mainfrom
neilconway:neilc/fix-cross-join-byte-size
Jun 2, 2026
Merged

fix: correct cross join byte size statistics#22700
Dandandan merged 1 commit into
apache:mainfrom
neilconway:neilc/fix-cross-join-byte-size

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented Jun 1, 2026

Which issue does this PR close?

Rationale for this change

stats_cartesian_product computes the total byte size of a cross join as:

      let total_byte_size = left_stats
          .total_byte_size
          .multiply(&right_stats.total_byte_size)
          .multiply(&Precision::Exact(2));

This is wrong (e.g., it multiplies two byte-size values together). The correct formula is "left-num-rows * right-size-in-bytes + right-num-rows * left-size-in-bytes", since the left side is repeated once per row on the right, and vice versa.

What changes are included in this PR?

  • Fix total byte size formula for cross join
  • Update expected SLT results

Are these changes tested?

Yes; covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Jun 1, 2026
Copy link
Copy Markdown
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, indeed the updated formula is more accurate for cross join

Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neilconway

@Dandandan Dandandan added this pull request to the merge queue Jun 2, 2026
Merged via the queue into apache:main with commit 766096a Jun 2, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stats_cartesian_product computes incorrect byte size for cross join

4 participants