Fix attribution inflation from intermediate commits#870
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes incorrect attribution math in manual-commit sessions by preventing intermediate-commit inflation, excluding pre-session worktree dirt from “human” counts, and adding a holistic combined attribution for multi-session checkpoints.
Changes:
- Adjust attribution inputs/classification to avoid counting changes outside the current commit and to exclude PA1 (pre-session) baseline from human totals.
- Thread cross-session
allAgentFilesthrough attribution to prevent multi-session “human” inflation and addcombined_attributionpersisted on checkpoint summaries. - Persist per-prompt attribution diagnostics into session metadata and add tests covering baseline/multi-session cases.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/strategy/manual_commit_test.go | Updates expectations and adds tests for metadata filtering in committed-files fallback. |
| cmd/entire/cli/strategy/manual_commit_hooks.go | Computes allAgentFiles, guards PA1 shadow-branch usage, and computes/persists combined attribution. |
| cmd/entire/cli/strategy/manual_commit_condensation.go | Filters .entire/ out of fallback FilesTouched, threads allAgentFiles, and persists prompt-attribution diagnostics. |
| cmd/entire/cli/strategy/manual_commit_attribution.go | Adds cross-session/metadata exclusion and PA1 baseline subtraction logic. |
| cmd/entire/cli/strategy/manual_commit_attribution_test.go | Adds regression tests for baseline dirt, parent-tree diffing, and cross-session exclusion. |
| cmd/entire/cli/checkpoint/v2_committed.go | Persists prompt-attribution diagnostics into v2 session metadata. |
| cmd/entire/cli/checkpoint/committed.go | Persists prompt-attribution diagnostics, adds combined attribution preservation + update API, and adds session-metadata read helper. |
| cmd/entire/cli/checkpoint/checkpoint.go | Adds CombinedAttribution to checkpoint summaries and stores prompt-attribution diagnostics in committed metadata. |
| .gitignore | Ignores .tmp/ directory. |
Comments suppressed due to low confidence (1)
cmd/entire/cli/strategy/manual_commit_attribution.go:243
- Baseline (PA1) subtraction is only applied to accumulated user additions, but accumulated user removals still include PA1. If a worktree starts with pre-session deletions (PA1 UserLinesRemoved / UserRemovedPerFile), those removals will still be counted as in-session human_removed and will skew attribution. Consider tracking baseline removals per file and subtracting them from
classified.removedFromAgentFiles/classified.removedFromCommittedNonAgent(or fromtotalUserRemoved) the same way additions are handled.
totalUserAdded := relevantAccumulatedUser + agentDiffs.postCheckpointUserAdded + postToNonAgentFiles
// Use per-file filtered removals (symmetric with totalUserAdded) to avoid
// double-counting non-agent removals that also appear in nonAgent.userRemovedFromNonAgentFiles.
relevantAccumulatedRemoved := classified.removedFromAgentFiles + classified.removedFromCommittedNonAgent
totalUserRemoved := relevantAccumulatedRemoved + agentDiffs.postCheckpointUserRemoved
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit b221159. Configure here.
| TokenUsage: sessionData.TokenUsage, | ||
| SessionMetrics: buildSessionMetrics(state), | ||
| InitialAttribution: attribution, | ||
| PromptAttributionsJSON: marshalPromptAttributions(state.PromptAttributions), |
There was a problem hiding this comment.
Diagnostic JSON omits PendingPromptAttribution for mid-turn commits
Medium Severity
marshalPromptAttributions(state.PromptAttributions) persists only the already-finalized prompt attributions, but the attribution calculation in calculateSessionAttributions appends state.PendingPromptAttribution when non-nil (mid-turn commits like Codex). For those commits, state.PromptAttributions is empty since SaveStep never ran, so the diagnostic field will be nil despite the calculation using real PA data. The PromptAttributionsJSON field's purpose is to show "exactly which prompt recorded which 'user' lines" — it won't fulfill that for mid-turn commits.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b221159. Configure here.
| // Returns nil if no hook-provided metrics exist (e.g., for agents that don't report them). | ||
| // marshalPromptAttributions encodes PromptAttributions to JSON for diagnostic persistence. | ||
| // Returns nil if there are no attributions to persist. | ||
| func marshalPromptAttributions(pas []PromptAttribution) json.RawMessage { |
There was a problem hiding this comment.
Godoc comment attached to wrong function after insertion
Low Severity
The marshalPromptAttributions function was inserted between the existing godoc comment for buildSessionMetrics and the buildSessionMetrics function itself. Now marshalPromptAttributions has buildSessionMetrics's documentation as its godoc prefix, and buildSessionMetrics (line 328) has no godoc at all.
Reviewed by Cursor Bugbot for commit b221159. Configure here.
| // baseline tracks PA1 (CheckpointNumber <= 1) edits separately. | ||
| // PA1 captures pre-session worktree dirt that existed before the agent started, | ||
| // so it should be excluded from human contribution counts. | ||
| baselineUserRemoved int |
There was a problem hiding this comment.
Field baselineUserRemoved is accumulated but never read
Low Severity
The new baselineUserRemoved field in accumulatedEdits is populated in accumulatePromptEdits but never read anywhere. Its counterpart baselineUserAddedPerFile is correctly used in classifyBaselineEdits, but baseline removals are neither classified nor subtracted from relevantAccumulatedRemoved. This is dead code that may indicate an incomplete implementation of baseline removal exclusion.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b221159. Configure here.


Due to #869 this is a squashed version of that removes the accidentally committed
.tmp: #812Note
Medium Risk
Changes core attribution math and checkpoint metadata persistence, which could affect reported agent/human line counts across sessions and commits. Risk is moderated by added tests, but regressions could impact analytics/diagnostics for existing checkpoints.
Overview
Fixes attribution inflation in
manual-commitby (1) counting non-agent file changes againstparent→HEADwhen available (so intermediate commits aren’t double-counted), (2) excluding pre-session “baseline” edits captured in PA1 from human contribution totals, and (3) excluding files touched by other concurrent agent sessions from being misclassified as human work.Persists raw per-prompt attribution data (
prompt_attributions) into per-sessionmetadata.jsonfor diagnostics, and adds a root-levelcombined_attributionon checkpointmetadata.jsoncomputed once per multi-session commit (with a newUpdateCheckpointSummarypath and lightweightReadSessionMetadata). Also tightensfiles_touchedfallback to avoid attributing CLI metadata files, and updates.gitignoreto ignore.tmp/.Reviewed by Cursor Bugbot for commit b221159. Configure here.