Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pqarrow/arrowutils: Add sorting support for Struct, RunEndEncoded, FixedSizeBinary #936

Merged
merged 9 commits into from
Nov 26, 2024

Conversation

metalmatze
Copy link
Member

The RunEndEncoded part made my brain melt a couple of times. The Struct part is weird cause it always assumes the fields of a struct a multiple when creating the new struct array with array.NewStructArray(cols, names), so I had to figure that out and adjust the schema. FixedSizeBinary is simple compared to the other two.

@metalmatze metalmatze force-pushed the arrowutils-sort-dict-fixedbinary branch 4 times, most recently from 4751bc3 to 6e043a8 Compare November 25, 2024 15:38
@metalmatze
Copy link
Member Author

Hey @thorfour, any idea why Test_Sampler_MaxSizeAllocation is failing for this PR? It seems pretty unrelated.

@thorfour
Copy link
Contributor

Hey @thorfour, any idea why Test_Sampler_MaxSizeAllocation is failing for this PR? It seems pretty unrelated.

Replace the assert with LessOrEqual instead of Equal

@metalmatze
Copy link
Member Author

Done: dd0ae83 (#936)
Let's see.

@metalmatze metalmatze force-pushed the arrowutils-sort-dict-fixedbinary branch from dd0ae83 to 1ff59c3 Compare November 26, 2024 11:25
This will also allow us to limit all the newly supported column types, timestamp, struct and runendencoded
Please take a look at this fix, @thorfour. I'm not sure why this started panicking now.
Comment on lines +177 to +180
// The size can be 0 and in that case we don't want to sample.
if s.size == 0 {
return
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thorfour, this sometimes panics all of a sudden.
Specifically line 189/193 with the rand.Intn(int(s.size).
I'm not sure why we're sometimes running into this now. But this guard at least stops the panic.

t.Run("List", func(t *testing.T) {
mem := memory.NewGoAllocator()
// mem := memory.NewCheckedAllocator(memory.NewGoAllocator())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I tried to fix it and gave up. Let me take another look.

@metalmatze metalmatze merged commit e336b3a into main Nov 26, 2024
8 checks passed
@metalmatze metalmatze deleted the arrowutils-sort-dict-fixedbinary branch November 26, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants