-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[tmva] Implementation of a new shuffling strategy in RBatchGenerator #20071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[tmva] Implementation of a new shuffling strategy in RBatchGenerator #20071
Conversation
… in the dataframe
Test Results 22 files 22 suites 3d 16h 2m 25s ⏱️ For more details on these failures, see this check. Results for commit 57396ba. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this great work! This is a first iteration of comments from my side.
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Outdated
Show resolved
Hide resolved
bindings/pyroot/pythonizations/test/rbatchgenerator_completeness.py
Outdated
Show resolved
Hide resolved
Thank you for the review @vepadulano! I have implemented the changes that you suggested and left some comments where it was unclear what the code was doing. |
This Pull request:
Introduces a new shuffling strategy for creating training batches, ensuring that each batch consists of data from different parts of the RDataFrame. Each chunk loaded into memory, which is used to create batches, now contains blocks of data drawn from different parts of the dataframe.