We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DocumentDataset
Is your feature request related to a problem? Please describe.
(not urgent since we anyway have to spill to host memory, but we might benefit from faster I/O and dataset filtering e.g. in #417 )
Noticed an oddity in the PII examples / scripts / docs that PII doesn't work when we do DocDataset.read_*(backend="cudf") Given that
All of the examples / scripts / docs do a read dataset using dask (pandas) but to the Modifier pass in device='gpu'
Describe the solution you'd like The code works with DocumentDataset('cudf') I think we might just need to_pyarrow().tolist() when series is cudf type
to_pyarrow().tolist()
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Is your feature request related to a problem? Please describe.
(not urgent since we anyway have to spill to host memory, but we might benefit from faster I/O and dataset filtering e.g. in #417 )
Noticed an oddity in the PII examples / scripts / docs that PII doesn't work when we do DocDataset.read_*(backend="cudf")
Given that
All of the examples / scripts / docs do a read dataset using dask (pandas) but to the Modifier pass in device='gpu'
Describe the solution you'd like
The code works with DocumentDataset('cudf')
I think we might just need
to_pyarrow().tolist()
when series is cudf typeThe text was updated successfully, but these errors were encountered: