-
Notifications
You must be signed in to change notification settings - Fork 988
Reduce overhead when computing compound hash-based groupby aggregations #20736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
| std::vector<std::unique_ptr<aggregation>> visit( | ||
| data_type, cudf::detail::correlation_aggregation const&) override | ||
| { | ||
| std::vector<std::unique_ptr<aggregation>> aggs; | ||
| aggs.push_back(make_sum_aggregation()); | ||
| // COUNT_VALID | ||
| aggs.push_back(make_count_aggregation()); | ||
|
|
||
| return aggs; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused code: correlation aggregation is not yet supported in hash-based pipeline.
| // If it is not an input aggregation, we can force its output to be non-nullable | ||
| // (by storing `1` value in the `force_non_nullable` vector). | ||
| auto const is_input_agg = input_agg_kinds_set.contains(agg->kind); | ||
| force_non_nullable.push_back(is_input_agg ? 0 : 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| force_non_nullable.push_back(is_input_agg ? 0 : 1); | |
| force_non_nullable.push_back(not is_input_agg); |
Using int (4-bytes) for this vector seems like a waste. Perhaps using int8 instead. Using bool will not work with spans unfortunately.
I also recommend using std::span instead of cudf::host_span if possible.
We can revisit the other host_span usages here in a later PR. Reference: #20539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code to use int8_t.
I also tried using cuda::std::span but couldn't compile due to:
error: no suitable user-defined conversion from "std::tuple_element<1UL, const std::tuple<cudf::table_view, cudf::detail::host_vector<cudf::aggregation::Kind>, std::vector<std::unique_ptr<cudf::aggregation, std::default_delete<cudf::aggregation>>, std::allocator<std::unique_ptr<cudf::aggregation, std::default_delete<cudf::aggregation>>>>, std::vector<int8_t, std::allocator<int8_t>>, bool>>::type"
(aka "const std::__tuple_element_t<1UL, std::tuple<cudf::table_view, cudf::detail::host_vector<cudf::aggregation::Kind>, std::vector<std::unique_ptr<cudf::aggregation, std::default_delete<cudf::aggregation>>, std::allocator<std::unique_ptr<cudf::aggregation, std::default_delete<cudf::aggregation>>>>, std::vector<signed char, std::allocator<signed char>>, bool>>")
to "cuda::std::__4::span<const cudf::aggregation::Kind, 18446744073709551615UL>" exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is host-only, the std::span should be enough. No need for cuda::std::span.
I'm not sure if that will help this though. I can look at this more closely next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I found that the issue is due to implicit converting cudf::detail::host_vector to (cuda::)std::span. The implicit conversion is not implemented yet, so we should do that soon.
When trying to use std::span in place of host_span, the modification chains down to nearly dozen of unrelated files thus I would rather have std::span adoption implemented in other separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried std::span locally and only had to change 6 files in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host_vector should not come into play. Are you trying to change all of the host_span usages?
I would recommend to only change the host_span usage for this new int8 variable and we'll change any of the other host_span usages in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tried to change host_span for this new variable and another one, since they are passing to the same function:
std::unique_ptr<table> create_results_table(size_type output_size,
table_view const& values,
host_span<aggregation::Kind const> agg_kinds,
host_span<int8_t const> force_non_nullable,
so it makes more sense to change both together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. But it would require too many changes.
Whereas changing just the one for this PR will be enough for it to pass.
Please just change the one variable. We can change the other one(s) in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. std::span looks great now. Excited waiting for the full adoption 👍
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
|
Benchmark for For |
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
During computing the compound aggregations in hash-based groupby, we may perform unnecessary work or repeat the same computation multiple times. For example, we may output nulls for the aggregation results and launch a kernel to count nulls for them, although they are computed only to serve as intermediate data for other compound aggregation and their null masks/null counts are never accessed.
In the situations when we have many aggregations (hundred+ to thousand+ and up), such repeated or unnecessary work accumulates to a significant runtime overhead. As such, we want to reduce them as little as possible. This PR intends to do so, modifying the existing hash-based groupby aggregation framework to allow ignoring nulls for the aggregation outputs. By doing so, the results of aggregations requested by the users can have nulls as before (no change) while the aggregations generated only to serve as intermediate data for computing compound aggregations are always non-nullable.
Closes #20734.