-
Notifications
You must be signed in to change notification settings - Fork 89
Pull requests: NVIDIA/NeMo-Curator
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add Run GPU CI/CD on PR
tests/test_classifiers.py
PyTests
gpuci
#421
opened Dec 11, 2024 by
sarahyurick
•
Draft
2
Create notebook tutorials for distributed data classifiers
documentation
Improvements or additions to documentation
#415
opened Dec 6, 2024 by
sarahyurick
Loading…
[WIP] Efficient Exact Duplicate Removal Code
#404
opened Dec 2, 2024 by
praateekmahajan
•
Draft
3 tasks
Add documentation for Instruction-Data-Guard classifier
#398
opened Nov 27, 2024 by
sarahyurick
Loading…
Fix GPU error messages for fuzzy deduplication
#387
opened Nov 22, 2024 by
sarahyurick
•
Draft
1 of 2 tasks
Fuzzy Dedup: Make skipping the False positive check the default
enhancement
New feature or request
gpuci
Run GPU CI/CD on PR
#386
opened Nov 21, 2024 by
ayushdg
Loading…
2 of 3 tasks
Remove Run GPU CI/CD on PR
max_text_bytes_per_part
gpuci
#385
opened Nov 20, 2024 by
sarahyurick
Loading…
Global Run GPU CI/CD on PR
cache_dir
variable for exact, fuzzy, and semantic deduplication
gpuci
#384
opened Nov 19, 2024 by
sarahyurick
Loading…
3 tasks done
Convert
translation_example.py
into a Jupyter Notebook tutorial
#336
opened Oct 29, 2024 by
sarahyurick
•
Draft
Dapt data curation tutorial fuzzy and semantic dedupe
gpuci
Run GPU CI/CD on PR
#322
opened Oct 24, 2024 by
ruchaa-apte
Loading…
Add blocksize to
DocumentDataset.read_*
that uses dask_cudf.read_*
#285
opened Oct 8, 2024 by
praateekmahajan
Loading…
3 tasks
Added example notebook for translation with ct2 model.
documentation
Improvements or additions to documentation
Adding an example for executing NeMo modules using kubernetes Python …
documentation
Improvements or additions to documentation
#148
opened Jul 9, 2024 by
dpadmanabhan03
Loading…
2 of 3 tasks
Fix #53 - Add batched files reading support to separate_by_metadata script
#54
opened May 6, 2024 by
miguelusque
Loading…
ProTip!
Add no:assignee to see everything that’s not assigned.