-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-43660: [C++][Compute] Avoid ZeroCopyCastExec when casting Binary offset -> Binary offset types #48171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement! But you should add tests of casting of [Large]String containing non-zero and >8 offsets. I don't think your current implementation would work without also slicing the offsets buffer.
| } else { | ||
| output->offset = 0; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure all cases are covered correctly here. enable_if_t<is_base_binary_type<I>::value && is_base_binary_type<O>::value, Status> covers all pairs of Binary/LargeBinary/String/LargeString.
if constexpr (!I::is_utf8 && O::is_utf8) { handles the UTF8 validation in case we are going from {Binary, LargeBinary} to {String, LargeString}.
The allocation issue is only a problem when going from 32 to 64 or 64 to 32, so you can shape your new code like this:
if constexpr (sizeof(typename I::offset_type) != sizeof(typename O::offset_type)) {
std::shared_ptr<ArrayData> input_arr = input.ToArrayData();
ArrayData* output = out->array_data().get();
// ...
if (output->buffers[0]) {
...
return CastBinaryToBinaryOffsets<typename I::offset_type, typename O::offset_type>(
ctx, input, out->array_data().get());
} else {
return ZeroCopyCastExec(ctx, batch, out);
}Useful diagram I made some time ago: https://gist.github.com/felipecrv/3c02f3784221d946dec1b031c6d400db
| output->length = input_arr->length; | ||
| output->SetNullCount(input_arr->null_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| output->length = input_arr->length; | |
| output->SetNullCount(input_arr->null_count); | |
| output->length = input_arr->length; | |
| // output->offset is set below | |
| output->SetNullCount(input_arr->null_count); |
| output->length = input_arr->length; | ||
| output->SetNullCount(input_arr->null_count); | ||
| output->buffers = std::move(input_arr->buffers); | ||
| output->child_data = std::move(input_arr->child_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| output->child_data = std::move(input_arr->child_data); | |
| // binary/string arrays don't have child_data |
| if (output->buffers[0]) { | ||
| // If reusing the null bitmap, ensure offset into the first byte is the same as input. | ||
| output->offset = input_arr->offset % 8; | ||
| output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 8); | ||
| } else { | ||
| output->offset = 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should also slice the offsets buffer when you change the array offset.
| if (output->buffers[0]) { | |
| // If reusing the null bitmap, ensure offset into the first byte is the same as input. | |
| output->offset = input_arr->offset % 8; | |
| output->buffers[0] = SliceBuffer(output->buffers[0], input_arr->offset / 8); | |
| } else { | |
| output->offset = 0; | |
| } | |
| // slice buffers to reduce allocation when casting the offsets buffer | |
| auto length = input_arr->length; | |
| int64_t buffer_offset = 0; | |
| if (output->null_count != 0 && output->buffers[0]) { | |
| // avoiding reallocation of the validity buffer by allowing some padding bits | |
| output->offset = input_arr->offset % 8; | |
| buffer_offset = input_arr->offset / 8; | |
| } else { | |
| output->offset = 0; | |
| buffer_offset = input_arr->offset; | |
| } | |
| output->buffers[0] = SliceBuffer(output->buffers[0], buffer_offset, length); | |
| output->buffers[1] = SliceBuffer(output->buffers[1], buffer_offset, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review my suggestions and tell if something is weird. This stuff is hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions! I didn't slice the offset buffers originally because they are reallocated by CastBinaryToBinaryOffsets with the correct offset. Although, this would only be true for the case where the offset type changes, in the case where the offset stays the same then we would either have to slice here or call ZeroCopyCastExec like in the comment above?
Rationale for this change
Casting Binary offset -> Binary offset types relies on ZeroCopyCastExec, which propagates the offset of the input to the output. This can lead to larger allocations than necessary when casting arrays with offsets.
See #43660 and
#43661 for more context.
What changes are included in this PR?
Ensure output array has a small offset (it can still be non-zero since reusing the null bitmap requires in_offset % 8 == out_offset % 8)
Are these changes tested?
Ran unit tests and benchmarked locally.
Are there any user-facing changes?
No