Fix: Reduce GPU batch size in test config to prevent dataset length calculation error by cpaniaguam · Pull Request #65 · lnccbrown/LANfactory

cpaniaguam · 2025-12-11T18:53:58Z

While testing LANfactory's compatibililty with Python 3.13.* I ran into an issue: The test_torch_train_cli_smoke test fails when CUDA is available because the current GPU batch size (50,000) exceeds the number of samples per file in the test data (20,000). This caused the DatasetTorch.__len__() method to return 0, which made PyTorch's RandomSampler fail with:

ValueError: num_samples should be a positive integer value, but got num_samples=0

Root Cause

In DatasetTorch.__len__(), the calculation:

length = (n_files * ((samples_per_file // batch_size) * batch_size)) // batch_size

Results in 0 when batch_size > samples_per_file due to integer division.

With the test configuration:

Test data: 2 files with 20,000 samples each
Train/val split (0.98): Only 1 file used for training
GPU batch size: 50,000
Result: (1 * ((20000 // 50000) * 50000)) // 50000 = 0

Solution

Reduced GPU_BATCH_SIZE from 50,000 to 5,000 in config_network_training_lan.yaml. This ensures the batch size is smaller than the number of samples per file, allowing the dataset to have a positive length while still testing GPU functionality.

Note

The test_jax_train_cli_smoke test was passing because JAX defaults to CPU in the test environment, using the smaller CPU_BATCH_SIZE of 1,000.

codecov · 2025-12-11T19:08:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AlexanderFengler

looks good.

fix: correct GPU_BATCH_SIZE in network training configuration

3c4a3d2

cpaniaguam requested a review from AlexanderFengler December 11, 2025 19:27

AlexanderFengler approved these changes Dec 15, 2025

View reviewed changes

AlexanderFengler merged commit 7f787a6 into main Dec 15, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Reduce GPU batch size in test config to prevent dataset length calculation error#65

Fix: Reduce GPU batch size in test config to prevent dataset length calculation error#65
AlexanderFengler merged 1 commit intomainfrom
fix-test-torch-train-gpu

cpaniaguam commented Dec 11, 2025

Uh oh!

codecov bot commented Dec 11, 2025

Uh oh!

AlexanderFengler left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cpaniaguam commented Dec 11, 2025

Root Cause

Solution

Note

Uh oh!

codecov bot commented Dec 11, 2025

Codecov Report

Uh oh!

AlexanderFengler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants