Fix attribution inflation from intermediate commits#812
Merged
gtrrz-victor merged 13 commits intomainfrom Apr 7, 2026
Merged
Conversation
Entire-Checkpoint: 079c1c0e0eeb
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses attribution “inflation” caused by counting non-agent file changes from intermediate commits by switching non-agent file detection to prefer a per-commit diff base (first parent → HEAD) when available.
Changes:
- Plumbs
parentCommitHashfrom the post-commit hook into condensation/attribution calculation. - Updates attribution logic to prefer
parentCommitHash→headCommitHashfor enumerating non-agent changed files (falling back toattributionBaseCommit→headCommitHash). - Updates unit tests to match the new
CalculateAttributionWithAccumulatedsignature.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
cmd/entire/cli/strategy/manual_commit_hooks.go |
Captures HEAD’s first parent hash during post-commit handling and passes it into condensation options. |
cmd/entire/cli/strategy/manual_commit_condensation.go |
Adds parentCommitHash to condensation/attribution option structs and threads it through to attribution calculation. |
cmd/entire/cli/strategy/manual_commit_attribution.go |
Prefers parentCommitHash as the diff base for non-agent changed-file enumeration. |
cmd/entire/cli/strategy/manual_commit_attribution_test.go |
Updates calls to CalculateAttributionWithAccumulated for the new parameter list. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Non-agent file line counting inconsistent with file scoping
- Threaded parentTree through condenseOpts/attributionOpts/CalculateAttributionWithAccumulated and used it instead of baseTree for non-agent file line counting, so diffs are now parent→head (consistent with file scoping) instead of session-base→head.
- ✅ Fixed: Duplicated parent commit hash computation in handlers
- Precomputed parentCommitHash alongside parentTree in PostCommit, stored it on the postCommitActionHandler struct, and replaced the duplicated 6-line blocks in HandleCondense and HandleCondenseIfFilesTouched with the precomputed field.
Or push these changes by commenting:
@cursor push 6cd3ef87a2
Preview (6cd3ef87a2)
diff --git a/cmd/entire/cli/strategy/manual_commit_attribution.go b/cmd/entire/cli/strategy/manual_commit_attribution.go
--- a/cmd/entire/cli/strategy/manual_commit_attribution.go
+++ b/cmd/entire/cli/strategy/manual_commit_attribution.go
@@ -185,6 +185,11 @@
// For initial commits (no parent), falls back to attributionBaseCommit→headCommitHash.
// When hashes are empty, falls back to go-git tree walk.
//
+// parentTree is the tree of the parent commit (nil for initial commits). When provided
+// alongside parentCommitHash, non-agent file line counting uses parentTree instead of
+// baseTree so that only THIS commit's changes are counted (consistent with the file
+// scoping from parentCommitHash→headCommitHash).
+//
// Note: Binary files (detected by null bytes) are silently excluded from attribution
// calculations since line-based diffing only applies to text files.
//
@@ -200,6 +205,7 @@
parentCommitHash string,
attributionBaseCommit string,
headCommitHash string,
+ parentTree *object.Tree,
) *checkpoint.InitialAttribution {
if len(filesTouched) == 0 {
return nil
@@ -253,7 +259,13 @@
if diffBaseCommit == "" {
diffBaseCommit = attributionBaseCommit
}
- allChangedFiles, err := getAllChangedFiles(ctx, baseTree, headTree, repoDir, diffBaseCommit, headCommitHash)
+ // Use parentTree for line counting when available (consistent with file scoping).
+ // For initial commits, fall back to session baseTree.
+ nonAgentDiffTree := parentTree
+ if nonAgentDiffTree == nil {
+ nonAgentDiffTree = baseTree
+ }
+ allChangedFiles, err := getAllChangedFiles(ctx, nonAgentDiffTree, headTree, repoDir, diffBaseCommit, headCommitHash)
if err != nil {
logging.Warn(logging.WithComponent(ctx, "attribution"),
"attribution: failed to enumerate changed files",
@@ -267,9 +279,9 @@
continue // Skip agent-touched files
}
- baseContent := getFileContent(baseTree, filePath)
+ diffBaseContent := getFileContent(nonAgentDiffTree, filePath)
headContent := getFileContent(headTree, filePath)
- _, userAdded, _ := diffLines(baseContent, headContent)
+ _, userAdded, _ := diffLines(diffBaseContent, headContent)
allUserEditsToNonAgentFiles += userAdded
}
diff --git a/cmd/entire/cli/strategy/manual_commit_attribution_test.go b/cmd/entire/cli/strategy/manual_commit_attribution_test.go
--- a/cmd/entire/cli/strategy/manual_commit_attribution_test.go
+++ b/cmd/entire/cli/strategy/manual_commit_attribution_test.go
@@ -281,7 +281,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -338,7 +338,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -395,7 +395,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -444,7 +444,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -496,7 +496,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -550,7 +550,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -619,7 +619,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -662,7 +662,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, []string{}, []PromptAttribution{}, "", "", "", "",
+ baseTree, shadowTree, headTree, []string{}, []PromptAttribution{}, "", "", "", "", nil,
)
if result != nil {
@@ -716,7 +716,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -1021,7 +1021,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -1092,7 +1092,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
@@ -1173,7 +1173,7 @@
result := CalculateAttributionWithAccumulated(
context.Background(),
- baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "",
+ baseTree, shadowTree, headTree, filesTouched, promptAttributions, "", "", "", "", nil,
)
require.NotNil(t, result, "expected non-nil result")
diff --git a/cmd/entire/cli/strategy/manual_commit_condensation.go b/cmd/entire/cli/strategy/manual_commit_condensation.go
--- a/cmd/entire/cli/strategy/manual_commit_condensation.go
+++ b/cmd/entire/cli/strategy/manual_commit_condensation.go
@@ -89,6 +89,7 @@
type condenseOpts struct {
shadowRef *plumbing.Reference // Pre-resolved shadow branch ref (nil = resolve from repo)
headTree *object.Tree // Pre-resolved HEAD tree (passed through to calculateSessionAttributions)
+ parentTree *object.Tree // Pre-resolved parent tree (nil for initial commits, passed through for attribution)
repoDir string // Repository worktree path for git CLI commands
parentCommitHash string // HEAD's first parent hash for per-commit non-agent file detection
headCommitHash string // HEAD commit hash (passed through for attribution)
@@ -196,6 +197,7 @@
attribution := calculateSessionAttributions(ctx, repo, ref, sessionData, state, attributionOpts{
headTree: o.headTree,
+ parentTree: o.parentTree,
repoDir: o.repoDir,
attributionBaseCommit: attrBase,
parentCommitHash: o.parentCommitHash,
@@ -339,6 +341,7 @@
type attributionOpts struct {
headTree *object.Tree // HEAD commit tree (already resolved by PostCommit)
shadowTree *object.Tree // Shadow branch tree (already resolved by PostCommit)
+ parentTree *object.Tree // Parent commit tree (already resolved by PostCommit, nil for initial commits)
repoDir string // Repository worktree path for git CLI commands
attributionBaseCommit string // Base commit hash for non-agent file detection (empty = fall back to go-git tree walk)
parentCommitHash string // HEAD's first parent hash (preferred diff base for non-agent files)
@@ -455,6 +458,7 @@
o.parentCommitHash,
o.attributionBaseCommit,
o.headCommitHash,
+ o.parentTree,
)
if attribution != nil {
diff --git a/cmd/entire/cli/strategy/manual_commit_hooks.go b/cmd/entire/cli/strategy/manual_commit_hooks.go
--- a/cmd/entire/cli/strategy/manual_commit_hooks.go
+++ b/cmd/entire/cli/strategy/manual_commit_hooks.go
@@ -617,10 +617,11 @@
// Cached git objects — resolved once per PostCommit invocation to avoid
// redundant reads across filesOverlapWithContent, filesWithRemainingAgentChanges,
// CondenseSession, and calculateSessionAttributions.
- headTree *object.Tree // HEAD commit tree (shared across all sessions)
- parentTree *object.Tree // HEAD's first parent tree (shared, nil for initial commits)
- shadowRef *plumbing.Reference // Per-session shadow branch ref (nil if branch doesn't exist)
- shadowTree *object.Tree // Per-session shadow commit tree (nil if branch doesn't exist)
+ headTree *object.Tree // HEAD commit tree (shared across all sessions)
+ parentTree *object.Tree // HEAD's first parent tree (shared, nil for initial commits)
+ parentCommitHash string // HEAD's first parent hash (empty for initial commits)
+ shadowRef *plumbing.Reference // Per-session shadow branch ref (nil if branch doesn't exist)
+ shadowTree *object.Tree // Per-session shadow commit tree (nil if branch doesn't exist)
// Output: set by handler methods, read by caller after TransitionAndLog.
condensed bool
@@ -639,17 +640,12 @@
)
if shouldCondense {
- parentCommitHash := ""
- if h.commit.NumParents() > 0 {
- if parent, err := h.commit.Parent(0); err == nil {
- parentCommitHash = parent.Hash.String()
- }
- }
h.condensed = h.s.condenseAndUpdateState(h.ctx, h.repo, h.checkpointID, state, h.head, h.shadowBranchName, h.shadowBranchesToDelete, h.committedFileSet, condenseOpts{
shadowRef: h.shadowRef,
headTree: h.headTree,
+ parentTree: h.parentTree,
repoDir: h.repoDir,
- parentCommitHash: parentCommitHash,
+ parentCommitHash: h.parentCommitHash,
headCommitHash: h.newHead,
})
} else {
@@ -672,17 +668,12 @@
)
if shouldCondense {
- parentCommitHash := ""
- if h.commit.NumParents() > 0 {
- if parent, err := h.commit.Parent(0); err == nil {
- parentCommitHash = parent.Hash.String()
- }
- }
h.condensed = h.s.condenseAndUpdateState(h.ctx, h.repo, h.checkpointID, state, h.head, h.shadowBranchName, h.shadowBranchesToDelete, h.committedFileSet, condenseOpts{
shadowRef: h.shadowRef,
headTree: h.headTree,
+ parentTree: h.parentTree,
repoDir: h.repoDir,
- parentCommitHash: parentCommitHash,
+ parentCommitHash: h.parentCommitHash,
headCommitHash: h.newHead,
})
} else {
@@ -851,8 +842,10 @@
headTree = t
}
var parentTree *object.Tree
+ var parentCommitHash string
if commit.NumParents() > 0 {
if parent, err := commit.Parent(0); err == nil {
+ parentCommitHash = parent.Hash.String()
if t, err := parent.Tree(); err == nil {
parentTree = t
}
@@ -871,8 +864,8 @@
}
iterCtx, iterSpan := processSessionsLoop.Iteration(loopCtx)
s.postCommitProcessSession(iterCtx, repo, state, &transitionCtx, checkpointID,
- head, commit, newHead, worktreePath, headTree, parentTree, committedFileSet,
- shadowBranchesToDelete, uncondensedActiveOnBranch)
+ head, commit, newHead, worktreePath, headTree, parentTree, parentCommitHash,
+ committedFileSet, shadowBranchesToDelete, uncondensedActiveOnBranch)
iterSpan.End()
}
processSessionsLoop.End()
@@ -917,6 +910,7 @@
newHead string,
repoDir string,
headTree, parentTree *object.Tree,
+ parentCommitHash string,
committedFileSet map[string]struct{},
shadowBranchesToDelete map[string]struct{},
uncondensedActiveOnBranch map[string]bool,
@@ -1008,6 +1002,7 @@
filesTouchedBefore: filesTouchedBefore,
headTree: headTree,
parentTree: parentTree,
+ parentCommitHash: parentCommitHash,
shadowRef: shadowRef,
shadowTree: shadowTree,
}This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Pre-session dirty files (CLI config files from `entire enable`, leftover changes from previous sessions) were incorrectly counted as human contributions, deflating agent percentage. Root cause: PA1 (first prompt attribution) captures worktree state at session start. This data was used to correct agent line counts (correct) but also added to human contributions (wrong). Fix: - Split prompt attributions into baseline (PA1) and session (PA2+) - PA1 data still subtracted from agent work (correct agent calc) - PA1 contributions excluded from relevantAccumulatedUser - PA1 removals excluded from totalUserRemoved - Include PendingPromptAttribution during condensation for agents that skip SaveStep (e.g., Codex mid-turn commits) - Add .entire/ filter to attribution calc (matches existing PA filter) - Fix wrapcheck lint errors in updateCombinedAttributionForCheckpoint Verified end-to-end: 100% agent with config files committed alongside. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b0cb4216f6bc
…ibution Checkpoint package changes required by the attribution baseline fix: - PromptAttributionsJSON field on WriteCommittedOptions and CommittedMetadata - UpdateCheckpointSummary method on GitStore for multi-session aggregation - CombinedAttribution field on CheckpointSummary - Preserve existing CombinedAttribution during summary rewrites Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b8963737336c
…arentCommitHash Fixes all 4 issues from Copilot and Cursor Bugbot review: 1. Precompute parentCommitHash on postCommitActionHandler struct using ParentHashes[0] (avoids extra object read, no silent error) 2. Remove duplicated 6-line parentCommitHash computation from HandleCondense and HandleCondenseIfFilesTouched 3. Thread parentTree through condenseOpts/attributionOpts and use it for non-agent file line counting — ensures diffLines uses parent→HEAD (consistent with parentCommitHash file scoping) instead of sessionBase→HEAD which over-counted intermediate commit changes 4. Add ParentTreeForNonAgentLines test proving the fix (TDD verified: HumanAdded=8 without fix → HumanAdded=3 with fix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 12f5c4373467
Contributor
Author
|
@BugBot review |
Three fixes for multi-session attribution: 1. Cross-session file exclusion: Thread allAgentFiles (union of all sessions' FilesTouched) through the attribution pipeline. Files created by other agent sessions are no longer counted as human work. 2. Exclude .entire/ from commit session fallback: When the commit session has no FilesTouched and falls back to all committed files, filter out .entire/ metadata created by `entire enable`. 3. PA1 baseline uses base tree for new sessions: New sessions (StepCount == 0) always diff against the base commit tree, not the shared shadow branch which may contain other sessions' state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tering - Test AllAgentFiles cross-session exclusion in CalculateAttributionWithAccumulated - Test committedFilesExcludingMetadata filters .entire/ paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The combined_attribution field now diffs parent→HEAD once and classifies files as agent vs human based on the union of sessions with real checkpoints (SaveStep ran). Filters .entire/ and .claude/ config paths. Also adds ReadSessionMetadata for lightweight per-session metadata reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gtrrz-victor
approved these changes
Apr 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Fixes attribution inflation that caused agent percentages to be incorrectly deflated, and adds holistic combined attribution for multi-session checkpoints. Supersedes #813.
Problems Fixed
1. Intermediate commit inflation
When calculating non-agent file contributions, the code used
attributionBaseCommit → headCommitHash(session start → current commit). In multi-commit sessions, this counted ALL user edits since session start instead of just this commit's changes.Fix: Use
parentCommitHash → headCommitHash(first parent → current commit) so only this commit's non-agent file changes are counted.2. Pre-session worktree dirt
PA1 (first PromptAttribution) captures the worktree state at session start, which includes files already dirty before the session (CLI config from
entire enable, leftover changes). These were incorrectly added to human contribution counts.Fix: Split prompt attributions into baseline (PA1,
CheckpointNumber <= 1) and session (PA2+). PA1 is used for agent line correction but excluded from human counts.3. Multi-session cross-contamination
In multi-session commits (e.g., 3
claude -pcalls then commit), each session's attribution counted files from OTHER agent sessions as human work. Session A creatingblue.mdwould see Session B'sred.mdas human-added.Fix: Thread
allAgentFiles(union of all sessions'FilesTouched) through the attribution pipeline. Files touched by any agent session are excluded from human counts in all sessions.4. Commit session claiming
.entire/config filesWhen a session with no
FilesTouched(e.g., a commit-only session) falls back to using all committed files,.entire/config files fromentire enablewere included as agent work.Fix: Filter
.entire/paths from the fallback incommittedFilesExcludingMetadata().5. PA1 using shared shadow branch for new sessions
New sessions incorrectly used the shadow branch (which may contain other sessions' checkpoint data) as the PA1 reference. This caused pre-session worktree dirt to be missed because it already existed in the shared shadow tree.
Fix: New sessions (
StepCount == 0) always diff against the base commit tree for PA1, not the shadow branch.6. Combined attribution (replaces #813)
Added holistic
combined_attributionfield onCheckpointSummaryfor multi-session checkpoints. Instead of naively summing per-session values (which double-counts), this diffsparent → HEADonce and classifies files as agent vs human based on the union of sessions with real checkpoints (CheckpointsCount > 0). Filters.entire/and.claude/config paths.Test Results
Files Changed
manual_commit_attribution.go—AllAgentFilesonAttributionParams,isAgentOrMetadataFile()helper,classifyBaselineEdits()for PA1 splitmanual_commit_condensation.go— ThreadallAgentFilesthrough opts,committedFilesExcludingMetadata()for.entire/filteringmanual_commit_hooks.go— ComputeallAgentFilesunion in PostCommit, PA1 shadow branch guard (StepCount > 0), holisticupdateCombinedAttributionForCheckpoint()checkpoint/committed.go—ReadSessionMetadata(),UpdateCheckpointSummary()checkpoint/checkpoint.go—CombinedAttributionfield onCheckpointSummary