Skip to content

Conversation

@mhaseeb123
Copy link
Member

Description

This PR implements filtering row groups using byte ranges in the new experimental parquet reader using the existing filtering APIs from the main parquet reader

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mhaseeb123 mhaseeb123 requested a review from a team as a code owner November 25, 2025 19:06
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 25, 2025
void parquet_reader_options::set_skip_bytes(size_t val)
{
CUDF_EXPECTS(val == 0 or std::cmp_equal(_source.num_sources(), 1),
CUDF_EXPECTS(val == 0 or std::cmp_less_equal(_source.num_sources(), 1),
Copy link
Member Author

@mhaseeb123 mhaseeb123 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no internal source in case of the hybrid scan reader so relaxing the condition

@mhaseeb123 mhaseeb123 added feature request New feature or request 3 - Ready for Review Ready for review by team non-breaking Non-breaking change Velox Functionality that helps Velox-cudf labels Nov 25, 2025
* @param options Parquet reader options
* @return Filtered row group indices
*/
[[nodiscard]] std::vector<size_type> filter_row_groups_with_byte_range(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New public API

auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};

return _impl->filter_row_groups_with_byte_range(input_row_group_indices, options).front();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call the impl API

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Velox Functionality that helps Velox-cudf

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants