Skip to content

Conversation

martinfoell
Copy link
Contributor

This Pull request:

Introduces a new shuffling strategy for creating training batches, ensuring that each batch consists of data from different parts of the RDataFrame. Each chunk loaded into memory, which is used to create batches, now contains blocks of data drawn from different parts of the dataframe.

Copy link

github-actions bot commented Oct 9, 2025

Test Results

    22 files      22 suites   3d 16h 2m 25s ⏱️
 3 692 tests  3 691 ✅ 0 💤 1 ❌
79 273 runs  79 268 ✅ 0 💤 5 ❌

For more details on these failures, see this check.

Results for commit 57396ba.

♻️ This comment has been updated with latest results.

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this great work! This is a first iteration of comments from my side.

@martinfoell
Copy link
Contributor Author

Thank you for the review @vepadulano! I have implemented the changes that you suggested and left some comments where it was unclear what the code was doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants