Skip to content

fix: accumulate train_loss correctly across micro-steps#357

Open
sjhddh wants to merge 1 commit intokarpathy:masterfrom
sjhddh:fix-grad-accum-loss
Open

fix: accumulate train_loss correctly across micro-steps#357
sjhddh wants to merge 1 commit intokarpathy:masterfrom
sjhddh:fix-grad-accum-loss

Conversation

@sjhddh
Copy link

@sjhddh sjhddh commented Mar 20, 2026

Currently, train_loss = loss.detach() overwrites the loss tracking on each micro-step, meaning the reported train_loss_f solely reflects the loss of the final micro-batch instead of the true mean of the entire global batch.

This PR fixes this by accumulating loss.detach() / grad_accum_steps over the loop. This reduces the variance in the logged debiased_smooth_loss and provides an exact mean loss corresponding to the global batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant