Skip to content

Fix: Reduce GPU batch size in test config to prevent dataset length calculation error#65

Merged
AlexanderFengler merged 1 commit intomainfrom
fix-test-torch-train-gpu
Dec 15, 2025
Merged

Fix: Reduce GPU batch size in test config to prevent dataset length calculation error#65
AlexanderFengler merged 1 commit intomainfrom
fix-test-torch-train-gpu

Conversation

@cpaniaguam
Copy link
Collaborator

While testing LANfactory's compatibililty with Python 3.13.* I ran into an issue: The test_torch_train_cli_smoke test fails when CUDA is available because the current GPU batch size (50,000) exceeds the number of samples per file in the test data (20,000). This caused the DatasetTorch.__len__() method to return 0, which made PyTorch's RandomSampler fail with:

ValueError: num_samples should be a positive integer value, but got num_samples=0

Root Cause

In DatasetTorch.__len__(), the calculation:

length = (n_files * ((samples_per_file // batch_size) * batch_size)) // batch_size

Results in 0 when batch_size > samples_per_file due to integer division.

With the test configuration:

  • Test data: 2 files with 20,000 samples each
  • Train/val split (0.98): Only 1 file used for training
  • GPU batch size: 50,000
  • Result: (1 * ((20000 // 50000) * 50000)) // 50000 = 0

Solution

Reduced GPU_BATCH_SIZE from 50,000 to 5,000 in config_network_training_lan.yaml. This ensures the batch size is smaller than the number of samples per file, allowing the dataset to have a positive length while still testing GPU functionality.

Note

The test_jax_train_cli_smoke test was passing because JAX defaults to CPU in the test environment, using the smaller CPU_BATCH_SIZE of 1,000.

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@AlexanderFengler AlexanderFengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.

@AlexanderFengler AlexanderFengler merged commit 7f787a6 into main Dec 15, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants