-
Notifications
You must be signed in to change notification settings - Fork 2
Fix batch size handling in DatasetTorch to ensure at least one batch per file #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix batch size handling in DatasetTorch to ensure at least one batch per file #66
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
AlexanderFengler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few thoughts thanks @cpaniaguam.
Definitely wasn't in good shape (not robust) before.
Thanks for looking into it.
…y samples per file
AlexanderFengler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the batch-size choices in the tests seem a bit crazy now, but let's roll with it :). Thanks @copilot
|
@AlexanderFengler I've opened a new pull request, #68, to work on those changes. Once the pull request is ready, I'll request review from you. |
This pull request introduces several improvements and bug fixes to the
DatasetTorchclass and its usage, focusing on batch size handling, error checking, and test consistency. The main changes enforce that the batch size must evenly divide the number of samples per file, update the logic for calculating batches per file, and update tests and configuration to use consistent, valid batch sizes.Core logic and validation improvements:
DatasetTorchto raise aValueErrorifbatch_sizedoes not evenly divide the number of samples per file, ensuring data consistency and preventing subtle bugs during batching.__getitem__, improving code clarity and reliability.Test updates and coverage:
tests/test_torch_mlp.pyto use batch sizes that evenly divide the sample count, and added a new test to verify that an error is raised if this condition is not met. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]Configuration and constants alignment:
Minor code cleanups:
.keys()with directinchecks for dictionary membership throughout the code for improved readability and Pythonic style. [1] [2] [3]