Fix off-by-one, log-zero, and start_loss-zero bugs in reweight()#91
Merged
Conversation
Three latent bugs in src/microcalibrate/reweight.py: 1. The dense training loop guarded its gradient step with `if i != max_epochs - 1` where `max_epochs = epochs - 1`, which actually skipped the penultimate epoch (i == epochs - 2) while still stepping on the final epoch. The returned final_weights therefore drifted one step away from the final tracked row. Every epoch now steps, and the tracker always ends on the final epoch so logged estimates correspond to the pre-step state of the returned weights. 2. `np.log(original_weights + random_noise)` produced -inf (and downstream NaN gradients) whenever an initial weight was zero and noise_level was zero, and the L0 branch hit the same issue even with nonzero noise because it logs the raw weights. Both call sites now clamp inputs to >= 1e-12. 3. The sparse L0 loop computed `(l.item() - start_loss) / start_loss` unconditionally; when start_loss happened to be ~0 (trivially calibrated data with l0_lambda 0) this raised ZeroDivisionError inside the tqdm postfix. A small-magnitude guard now short-circuits to 0.0 for the display value. Adds tests/test_reweight_regression.py covering all three fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
MaxGhenis
commented
Apr 17, 2026
Contributor
Author
MaxGhenis
left a comment
There was a problem hiding this comment.
LGTM (cannot self-approve; posting as comment).
- Off-by-one fix: removed the
max_epochs = epochs - 1guard so every epoch steps. Loop semantic is now clean and documented in the code: pre-step state of epoch i is logged (when tracked); post-last-step state is returned. The asymmetry between the final logged row and the returned weights is acknowledged in the inline comment — a reasonable choice vs. running an extra forward pass after the last step. is_final_epochguard ensures the tracker always contains epochepochs - 1, correctly fixing the diagnostic/returned-state disagreement.np.maximum(..., 1e-12)added on both the dense and sparse log-weight initialisations — consistent.start_lossdivide-by-zero guard usesabs(start_loss) < 1e-12which is correct for both zero and near-zero starts.
Minor (non-blocking) notes:
test_all_epochs_stepcompares weights after N and N-1 epochs. Under the original bug the two runs still produced different weights (different epochs skipped), so this test doesn't strictly regress the off-by-one.test_final_epoch_matches_trackeralso doesn't regress it whentracking_n | (epochs - 1)(the default caseepochs=25, tracking_n=2). For a tighter regression test, considerepochs=30so(epochs-1) % tracking_n != 0. The fix itself is correct; this is just test tightness.- Pre-existing
test_evaluate_holdout_robustnessflakiness is unrelated to this PR — this PR doesn't touch RNG seeding (that's #93), so the underlying unseeded-numpy cause is present on main too. - Unrelated repo infra:
.python-versionstill pins 3.11 whilepyproject.tomlrequires >=3.13. Worth a follow-up cleanup but out of scope here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three latent bugs in
src/microcalibrate/reweight.py:if i != max_epochs - 1wheremax_epochs = epochs - 1, which actually skipped the penultimate epoch (i == epochs - 2) while still stepping on the final epoch. The returnedfinal_weightstherefore drifted one step away from the final tracked row. Every epoch now steps, and the tracker always ends on the final epoch so logged estimates correspond to the pre-step state of the returned weights.(l.item() - start_loss) / start_lossunconditionally; whenstart_losswas ~0 (trivial/pre-calibrated data withl0_lambda=0) this raisedZeroDivisionErrorinside thetqdmpostfix. A magnitude guard now short-circuits to0.0.np.log(original_weights + random_noise)produced-inf(and downstream NaN gradients) whenever an initial weight was zero andnoise_levelwas zero, and the L0 branch hit the same issue even with nonzero noise because it logs the raw weights. Both call sites now clamp inputs to>= 1e-12.Test plan
tests/test_reweight_regression.pycovering all three fixes: tracker includes the final epoch, N vs N-1 epochs produce different weights (every epoch steps), sparse loop does not crash whenstart_loss == 0, zero initial weights in the L0 path do not produce non-finite weights.uv run pytest tests -x -q-> 19 passed).🤖 Generated with Claude Code