Skip to content

Fix attribution inflation from intermediate commits#870

Merged
Soph merged 1 commit intomainfrom
fix/attribution-intermediate-commit-inflation-squash
Apr 8, 2026
Merged

Fix attribution inflation from intermediate commits#870
Soph merged 1 commit intomainfrom
fix/attribution-intermediate-commit-inflation-squash

Conversation

@Soph
Copy link
Copy Markdown
Collaborator

@Soph Soph commented Apr 8, 2026

Due to #869 this is a squashed version of that removes the accidentally committed .tmp: #812


Note

Medium Risk
Changes core attribution math and checkpoint metadata persistence, which could affect reported agent/human line counts across sessions and commits. Risk is moderated by added tests, but regressions could impact analytics/diagnostics for existing checkpoints.

Overview
Fixes attribution inflation in manual-commit by (1) counting non-agent file changes against parent→HEAD when available (so intermediate commits aren’t double-counted), (2) excluding pre-session “baseline” edits captured in PA1 from human contribution totals, and (3) excluding files touched by other concurrent agent sessions from being misclassified as human work.

Persists raw per-prompt attribution data (prompt_attributions) into per-session metadata.json for diagnostics, and adds a root-level combined_attribution on checkpoint metadata.json computed once per multi-session commit (with a new UpdateCheckpointSummary path and lightweight ReadSessionMetadata). Also tightens files_touched fallback to avoid attributing CLI metadata files, and updates .gitignore to ignore .tmp/.

Reviewed by Cursor Bugbot for commit b221159. Configure here.

Copilot AI review requested due to automatic review settings April 8, 2026 14:39
@Soph Soph marked this pull request as ready for review April 8, 2026 14:42
@Soph Soph requested a review from a team as a code owner April 8, 2026 14:42
@Soph Soph enabled auto-merge April 8, 2026 14:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect attribution math in manual-commit sessions by preventing intermediate-commit inflation, excluding pre-session worktree dirt from “human” counts, and adding a holistic combined attribution for multi-session checkpoints.

Changes:

  • Adjust attribution inputs/classification to avoid counting changes outside the current commit and to exclude PA1 (pre-session) baseline from human totals.
  • Thread cross-session allAgentFiles through attribution to prevent multi-session “human” inflation and add combined_attribution persisted on checkpoint summaries.
  • Persist per-prompt attribution diagnostics into session metadata and add tests covering baseline/multi-session cases.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/entire/cli/strategy/manual_commit_test.go Updates expectations and adds tests for metadata filtering in committed-files fallback.
cmd/entire/cli/strategy/manual_commit_hooks.go Computes allAgentFiles, guards PA1 shadow-branch usage, and computes/persists combined attribution.
cmd/entire/cli/strategy/manual_commit_condensation.go Filters .entire/ out of fallback FilesTouched, threads allAgentFiles, and persists prompt-attribution diagnostics.
cmd/entire/cli/strategy/manual_commit_attribution.go Adds cross-session/metadata exclusion and PA1 baseline subtraction logic.
cmd/entire/cli/strategy/manual_commit_attribution_test.go Adds regression tests for baseline dirt, parent-tree diffing, and cross-session exclusion.
cmd/entire/cli/checkpoint/v2_committed.go Persists prompt-attribution diagnostics into v2 session metadata.
cmd/entire/cli/checkpoint/committed.go Persists prompt-attribution diagnostics, adds combined attribution preservation + update API, and adds session-metadata read helper.
cmd/entire/cli/checkpoint/checkpoint.go Adds CombinedAttribution to checkpoint summaries and stores prompt-attribution diagnostics in committed metadata.
.gitignore Ignores .tmp/ directory.
Comments suppressed due to low confidence (1)

cmd/entire/cli/strategy/manual_commit_attribution.go:243

  • Baseline (PA1) subtraction is only applied to accumulated user additions, but accumulated user removals still include PA1. If a worktree starts with pre-session deletions (PA1 UserLinesRemoved / UserRemovedPerFile), those removals will still be counted as in-session human_removed and will skew attribution. Consider tracking baseline removals per file and subtracting them from classified.removedFromAgentFiles / classified.removedFromCommittedNonAgent (or from totalUserRemoved) the same way additions are handled.
	totalUserAdded := relevantAccumulatedUser + agentDiffs.postCheckpointUserAdded + postToNonAgentFiles
	// Use per-file filtered removals (symmetric with totalUserAdded) to avoid
	// double-counting non-agent removals that also appear in nonAgent.userRemovedFromNonAgentFiles.
	relevantAccumulatedRemoved := classified.removedFromAgentFiles + classified.removedFromCommittedNonAgent
	totalUserRemoved := relevantAccumulatedRemoved + agentDiffs.postCheckpointUserRemoved

@Soph Soph merged commit ad5ca27 into main Apr 8, 2026
8 checks passed
@Soph Soph deleted the fix/attribution-intermediate-commit-inflation-squash branch April 8, 2026 14:46
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit b221159. Configure here.

TokenUsage: sessionData.TokenUsage,
SessionMetrics: buildSessionMetrics(state),
InitialAttribution: attribution,
PromptAttributionsJSON: marshalPromptAttributions(state.PromptAttributions),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diagnostic JSON omits PendingPromptAttribution for mid-turn commits

Medium Severity

marshalPromptAttributions(state.PromptAttributions) persists only the already-finalized prompt attributions, but the attribution calculation in calculateSessionAttributions appends state.PendingPromptAttribution when non-nil (mid-turn commits like Codex). For those commits, state.PromptAttributions is empty since SaveStep never ran, so the diagnostic field will be nil despite the calculation using real PA data. The PromptAttributionsJSON field's purpose is to show "exactly which prompt recorded which 'user' lines" — it won't fulfill that for mid-turn commits.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b221159. Configure here.

// Returns nil if no hook-provided metrics exist (e.g., for agents that don't report them).
// marshalPromptAttributions encodes PromptAttributions to JSON for diagnostic persistence.
// Returns nil if there are no attributions to persist.
func marshalPromptAttributions(pas []PromptAttribution) json.RawMessage {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc comment attached to wrong function after insertion

Low Severity

The marshalPromptAttributions function was inserted between the existing godoc comment for buildSessionMetrics and the buildSessionMetrics function itself. Now marshalPromptAttributions has buildSessionMetrics's documentation as its godoc prefix, and buildSessionMetrics (line 328) has no godoc at all.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b221159. Configure here.

// baseline tracks PA1 (CheckpointNumber <= 1) edits separately.
// PA1 captures pre-session worktree dirt that existed before the agent started,
// so it should be excluded from human contribution counts.
baselineUserRemoved int
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field baselineUserRemoved is accumulated but never read

Low Severity

The new baselineUserRemoved field in accumulatedEdits is populated in accumulatePromptEdits but never read anywhere. Its counterpart baselineUserAddedPerFile is correctly used in classifyBaselineEdits, but baseline removals are neither classified nor subtracted from relevantAccumulatedRemoved. This is dead code that may indicate an incomplete implementation of baseline removal exclusion.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b221159. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants