-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[Not For Merge] Remove null-suppression in block serialization #26550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Not For Merge] Remove null-suppression in block serialization #26550
Conversation
3fd15ce to
595c265
Compare
|
Can you elaborate why? |
|
IIUC now you are serializing nulls twice. First just null positions and then all the values including null. |
Block serialization currently null-suppresses rows during serialization (i.e.: null values are removed from the serialized output and then re-constructed during deserialization by packing / unpacking according to the null masks) This is designed with the assumption that letting CPU perform the null values suppression and thus translating less data over the network is faster compared to not do null values suppression and translate more data over the network. However, this assumption may not be necessarily true consider the improvemnt in modern high throughput low latency network, which is especially true when nulls are rare the CPU cost is almost certainly not time well spent. So curretly, I am doing microbenchmark and e2e test for this change now, will update this PR when I have more experiment results. I submit this PR just to run test framework to help catch bugs, I should have put [Not For Merge] in the title of this PR, sorry about that... |
replied here: #26550 (comment) |
595c265 to
def9856
Compare
|
I conducted some experiment to use some shuffle heavy queries to test if the change is able to provide better performance, but we see some regression(see #26550 (comment)), so the assumption of letting CPU perform the null values suppression and thus translating less data over the network is faster compared to not do null values suppression and translate more data over the network from #26550 (comment) is still valid. So we don't want to ship this change. Instead, we choose to use SIMD to optimize null suppression #26919. So colsing this PR. |
|
Attahcment: Test query table DDL Table creation DML Test query Not Suppressed Result
Suppressed Result
Suppressed result still shows better performance. |
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: