Skip to content

Conversation

@cpaniaguam
Copy link
Collaborator

@cpaniaguam cpaniaguam commented Dec 11, 2025

This pull request introduces several improvements and bug fixes to the DatasetTorch class and its usage, focusing on batch size handling, error checking, and test consistency. The main changes enforce that the batch size must evenly divide the number of samples per file, update the logic for calculating batches per file, and update tests and configuration to use consistent, valid batch sizes.

Core logic and validation improvements:

  • Added validation in DatasetTorch to raise a ValueError if batch_size does not evenly divide the number of samples per file, ensuring data consistency and preventing subtle bugs during batching.
  • Simplified the calculation of the number of batches per file and the logic for generating batch indices in __getitem__, improving code clarity and reliability.

Test updates and coverage:

  • Updated all relevant tests in tests/test_torch_mlp.py to use batch sizes that evenly divide the sample count, and added a new test to verify that an error is raised if this condition is not met. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Configuration and constants alignment:

  • Updated batch size values in test configuration files and constants to reflect the new divisibility requirement, ensuring tests and sample configs are valid. [1] [2] [3] [4]

Minor code cleanups:

  • Replaced usage of .keys() with direct in checks for dictionary membership throughout the code for improved readability and Pythonic style. [1] [2] [3]

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/lanfactory/trainers/torch_mlp.py 86.86% <100.00%> (+0.22%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@AlexanderFengler AlexanderFengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few thoughts thanks @cpaniaguam.
Definitely wasn't in good shape (not robust) before.
Thanks for looking into it.

@cpaniaguam cpaniaguam self-assigned this Dec 16, 2025
Copy link
Member

@AlexanderFengler AlexanderFengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the batch-size choices in the tests seem a bit crazy now, but let's roll with it :). Thanks @copilot

@AlexanderFengler AlexanderFengler merged commit 5cc4437 into main Jan 4, 2026
7 checks passed
Copy link

Copilot AI commented Jan 4, 2026

@AlexanderFengler I've opened a new pull request, #68, to work on those changes. Once the pull request is ready, I'll request review from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants