Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format #1522

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Mar 13, 2025

Which issue does this PR close?

Closes #1520.

Rationale for this change

This is a problem I found when working on #1511, the null bits were not correctly written and caused test failures. This patch is an attempt to fix it.

This patch is only aiming for fixing correctness problems. As #1190 (comment) pointed out, the fast BatchWriter may write full data buffer for sliced Utf8 arrays, so there's still some performance implications when working with sliced arrays.

What changes are included in this PR?

Correctly take slicing indices and length into account when writing BooleanBuffers. This applies to null bits of all arrays, and the values of boolean arrays.

How are these changes tested?

Added a new round-trip test for sliced record batches.

@Kontinuation Kontinuation force-pushed the fix-codec-write-nulls branch from 1816f03 to 05a4b04 Compare March 13, 2025 04:06
@Kontinuation Kontinuation marked this pull request as ready for review March 13, 2025 04:50
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.97%. Comparing base (f09f8af) to head (05a4b04).
Report is 74 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1522      +/-   ##
============================================
+ Coverage     56.12%   58.97%   +2.84%     
- Complexity      976     1028      +52     
============================================
  Files           119      122       +3     
  Lines         11743    12268     +525     
  Branches       2251     2309      +58     
============================================
+ Hits           6591     7235     +644     
+ Misses         4012     3875     -137     
- Partials       1140     1158      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect null handling in fast shuffle encoder
2 participants