fix: accumulate `train_loss` correctly across micro-steps by sjhddh · Pull Request #357 · karpathy/autoresearch

sjhddh · 2026-03-20T23:06:41Z

Currently, train_loss = loss.detach() overwrites the loss tracking on each micro-step, meaning the reported train_loss_f solely reflects the loss of the final micro-batch instead of the true mean of the entire global batch.

This PR fixes this by accumulating loss.detach() / grad_accum_steps over the loop. This reduces the variance in the logged debiased_smooth_loss and provides an exact mean loss corresponding to the global batch.

fix: accumulate train_loss correctly across micro-steps

6164c43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: accumulate `train_loss` correctly across micro-steps#357

fix: accumulate `train_loss` correctly across micro-steps#357
sjhddh wants to merge 1 commit intokarpathy:masterfrom
sjhddh:fix-grad-accum-loss

sjhddh commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sjhddh commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant