Skip to content

Fix checkpoint saving using wrong interval config#320

Open
Mr-Neutr0n wants to merge 1 commit into
google-research:mainfrom
Mr-Neutr0n:fix/checkpoint-interval-bug
Open

Fix checkpoint saving using wrong interval config#320
Mr-Neutr0n wants to merge 1 commit into
google-research:mainfrom
Mr-Neutr0n:fix/checkpoint-interval-bug

Conversation

@Mr-Neutr0n
Copy link
Copy Markdown

Summary

The checkpoint saving condition in train.py (line 242) has a bug where it checks config.checkpoint_every to determine whether checkpointing is enabled, but then incorrectly uses config.eval_every for the modulo operation:

# Before (bug):
if ((config.checkpoint_every and step % config.eval_every == 0) or
    step == total_steps):

# After (fix):
if ((config.checkpoint_every and step % config.checkpoint_every == 0) or
    step == total_steps):

This causes checkpoints to be saved at evaluation intervals instead of the configured checkpoint intervals, making the checkpoint_every config value effectively ignored for controlling save frequency.

Test plan

  • Verified the fix by inspecting the changed line
  • The fix is a single-token change from config.eval_every to config.checkpoint_every in the modulo check, matching the guard condition on the same line

The checkpoint saving condition checked `config.checkpoint_every` to
determine whether checkpointing is enabled, but then incorrectly used
`config.eval_every` for the modulo operation. This caused checkpoints
to be saved at evaluation intervals instead of the configured checkpoint
intervals, making the `checkpoint_every` config value effectively ignored.

Replace `step % config.eval_every` with `step % config.checkpoint_every`
so checkpoints are saved at the intended frequency.
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Feb 11, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant