Skip to content

Conversation

@PointKernel
Copy link
Member

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 26, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Nov 26, 2025
Comment on lines +106 to +108
hll.add(hash_values.begin(), hash_values.end(), cuda::stream_ref{stream.value()});
return static_cast<cudf::size_type>(hll.estimate(cuda::stream_ref{stream.value()}));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about multi-gpu approx distinct count, I believe that two sketches can be combined by some binary operator, and that commutes through the estimate function. i.e. (hll(A) + hll(B)).estimate() == hll(A + B).estimate().

To produce a global approx distinct count from the GPU-local ones, I need to do this merge.

Can you provide an interface to return the hll.sketch_bytes() as an object that I can then combine with another sketch that was constructed using the same hashing scheme and approximation size?

Perhaps, spitballing:

std::unique_ptr<rmm::device_buffer> approx_distinct_count_sketch(args_as_for_approx_distinct_count);

std::unique_ptr<rmm::device_buffer>
merge_sketches(std::span<rmm::device_buffer> sketches) {
}

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have merge APIs in HLL that address this need: https://github.com/NVIDIA/cuCollections/blob/d36905c69ce02d74abdd31dc864ce3e1ffc5a7db/include/cuco/hyperloglog.cuh#L159-L221. The question is really about how to surface this capability in libcudf. One idea I had is to expose a class like approx_estimator in libcudf so users can perform custom operations such as merge. However, that class would essentially just wrap cuco::hyperloglog, meaning that for multi-GPU scenarios users could simply use cuco::hyperloglog directly without needing any cudf abstraction. Does that sound reasonable, or is there something I’m overlooking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, exposing an object-oriented estimator instead of the current free function is likely the better approach. It offers significantly more flexibility, and given the complexity involved with row operations and null/nan handling, relying on users to manage those aspects themselves would be fairly complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think nan/null handling should be provided by us, rather than the end user. I've not yet looked as well at all the row_hasher apis, do we expose those in the public interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. The row operators reside in the detail namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants