Skip to content

feat: multi-signal checkpoint linkage for resilience across git rewrites#840

Open
peyton-alt wants to merge 12 commits intomainfrom
peyton/ent-834-tree-hash-checkpoint-linkage
Open

feat: multi-signal checkpoint linkage for resilience across git rewrites#840
peyton-alt wants to merge 12 commits intomainfrom
peyton/ent-834-tree-hash-checkpoint-linkage

Conversation

@peyton-alt
Copy link
Copy Markdown
Contributor

@peyton-alt peyton-alt commented Apr 3, 2026

Summary

Store multiple content-based linkage signals in checkpoint metadata so the web can automatically re-link checkpoints after git history rewrites (rebase, reword, amend, filter-branch) without user intervention.

  • Add linkage block to CheckpointSummary on entire/checkpoints/v1 with four signals: tree_hash, patch_id, files_changed_hash, session_files_hash
  • Allow checkpoint trailers on agent-initiated revert/cherry-pick (when session is ACTIVE)
  • Remove per-session TreeHash from CommittedMetadata (superseded by checkpoint-level linkage)

Problem (#834)

When users rewrite git history (git rebase, git rebase -i reword, git filter-branch), the Entire-Checkpoint trailer can be stripped from the commit message. The checkpoint data still exists on entire/checkpoints/v1, but the rewritten commit no longer points to it.

Tree hash alone doesn't survive rebase — rebasing onto a new base changes the full tree snapshot even if the feature's own changes are identical.

How multi-signal linkage fixes this

Each signal captures a different aspect of the commit's identity. The web uses a fallback chain:

Signal What it hashes Survives
tree_hash Full repo snapshot Reword, amend (msg-only), filter-branch (msg-only)
patch_id Just the diff content (via git patch-id --stable) Clean rebase, cherry-pick
files_changed_hash SHA256 of agent-touched file:blob pairs Rebase with conflicts in other files
session_files_hash SHA256 of all session file:blob pairs Local squash merge

The only case with no match is when the agent's actual code was modified (e.g., conflict resolution in agent-touched files) — which is semantically correct.

What's in this PR (CLI-side)

  • LinkageMetadata struct with four fixed-size hash fields (all omitempty for clean JSON)
  • ComputePatchID: pipes git diff-tree -p through git patch-id --stable
  • ComputeFilesChangedHash: single git ls-tree call, SHA256 of sorted file:blob pairs
  • Commit-level linkage computed once per PostCommit via computeBaseLinkage() (cached), with SessionFilesHash added per-session via linkageForSession()
  • Written to root-level CheckpointSummary on entire/checkpoints/v1
  • Agent-initiated git revert/cherry-pick gets checkpoint trailer when session is ACTIVE (condensation deferred to next normal commit)

What's needed on the web side

  • DB migration: add patch_id, files_changed_hash, session_files_hash columns to repo_checkpoints
  • Webhook: extend fallback chain beyond tree hash to include patch_id → files_changed_hash → session_files_hash
  • Lazy backfill for new signals on existing checkpoints

Testing

Automated

  • TestComputePatchID / TestComputePatchID_StableAcrossRebase / TestComputePatchID_InitialCommit
  • TestComputeFilesChangedHash / TestComputeFilesChangedHash_StableAcrossRebase
  • TestWriteCommitted_IncludesLinkage / TestWriteCommitted_NilLinkageOmitted
  • TestShadowStrategy_PostCommit_LinkagePopulated — full pipeline integration test
  • TestShadowStrategy_PrepareCommitMsg_AgentRevertGetsTrailer / _UserRevertSkipped
  • Full CI passes (unit + integration + E2E canary)

Manual verification (local binary)

Tested with a real Claude Code session against a local repo:

  1. Linkage block populated — after a Claude Code commit, the entire/checkpoints/v1 metadata contains the full linkage block:
{
  "linkage": {
    "tree_hash": "d82e8f38e91d7e1efca4993ff7c4023313a55292",
    "patch_id": "e00217d29a078e2bd1e2d16e289a9cdc78c41df7",
    "files_changed_hash": "4c22e14d5ce80bdd2f..."
  }
}
  1. Patch ID survives rebase — created a feature branch, committed, rebased onto main with new commits. Patch ID 7a94b75a... was identical before and after rebase, while tree hash changed (expected).

  2. Tree hash survives rewordgit commit --amend -m "new msg" preserved the tree hash exactly.

Test plan

  • Verify linkage block appears in metadata.json on entire/checkpoints/v1 after a commit
  • Verify patch ID is stable across a clean rebase (same value before/after)
  • Verify tree hash survives reword (amend message only)
  • Verify old checkpoints without linkage still work (nil omitted, backward compat)
  • Web companion PR: verify fallback chain matches rewritten commits

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 3, 2026 00:25
@peyton-alt peyton-alt requested a review from a team as a code owner April 3, 2026 00:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stores a commit’s git tree hash in checkpoint metadata during condensation so the web side can re-link checkpoints after history rewrites that drop the Entire-Checkpoint trailer, and tweaks prepare-commit-msg behavior so agent-driven revert/cherry-pick operations still get checkpoint trailers when a session is active.

Changes:

  • Add tree_hash to committed checkpoint metadata (CommittedMetadata, WriteCommittedOptions) and populate it during PostCommit condensation from the HEAD commit’s tree hash.
  • Allow prepare-commit-msg to proceed during git sequence operations (revert/cherry-pick/rebase) when an ACTIVE session exists in the current worktree; keep skipping when no active session.
  • Add unit tests for tree_hash persistence and the new revert trailer behavior split by agent-active vs user/manual.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
cmd/entire/cli/strategy/manual_commit_test.go Adds tests ensuring agent revert gets a trailer when session is ACTIVE and user revert is skipped when not active.
cmd/entire/cli/strategy/manual_commit_session.go Adds helper to detect whether any session in the current worktree is ACTIVE.
cmd/entire/cli/strategy/manual_commit_hooks.go Adjusts PrepareCommitMsg sequence-operation skip logic; passes commit tree hash into condensation options.
cmd/entire/cli/strategy/manual_commit_condensation.go Threads treeHash through condensation options into committed checkpoint write options.
cmd/entire/cli/checkpoint/committed.go Writes TreeHash into per-session committed metadata.json.
cmd/entire/cli/checkpoint/checkpoint.go Extends checkpoint option/metadata structs to include TreeHash serialized as tree_hash.
cmd/entire/cli/checkpoint/checkpoint_test.go Adds coverage that tree_hash is written and read back from committed metadata.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

peyton-alt added a commit that referenced this pull request Apr 3, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pjbgf added a commit that referenced this pull request Apr 3, 2026
Add checkpoint linkage preservation after history rewrites (#840) and
fail-closed content detection in prepare-commit-msg (#826). Update
release date to 2026-04-03.

Signed-off-by: Paulo Gomes <paulo@entire.io>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 0154164ac0f9
pjbgf added a commit that referenced this pull request Apr 3, 2026
Add checkpoint linkage preservation after history rewrites (#840) and
fail-closed content detection in prepare-commit-msg (#826). Update
release date to 2026-04-03.

Signed-off-by: Paulo Gomes <paulo@entire.io>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 0154164ac0f9
pjbgf added a commit that referenced this pull request Apr 3, 2026
Add checkpoint linkage preservation after history rewrites (#840) and
fail-closed content detection in prepare-commit-msg (#826). Update
release date to 2026-04-03.

Signed-off-by: Paulo Gomes <paulo@entire.io>
Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 0154164ac0f9
@peyton-alt peyton-alt marked this pull request as draft April 7, 2026 05:49
@peyton-alt peyton-alt force-pushed the peyton/ent-834-tree-hash-checkpoint-linkage branch from 7b9d87d to 8302847 Compare April 7, 2026 06:17
@peyton-alt peyton-alt changed the title fix: store tree hash in checkpoint metadata for linkage resilience feat: multi-signal checkpoint linkage for resilience across git rewrites Apr 7, 2026
@peyton-alt peyton-alt force-pushed the peyton/ent-834-tree-hash-checkpoint-linkage branch 2 times, most recently from 182d479 to e70d0eb Compare April 7, 2026 20:46
peyton-alt and others added 12 commits April 7, 2026 16:54
When an agent runs git revert or cherry-pick as part of its work, the
commit should be checkpointed. Previously prepare-commit-msg
unconditionally skipped during sequence operations, making the agent's
work invisible to Entire.

Now checks for active sessions: if an agent session is ACTIVE, the
operation is agent-initiated and gets a trailer. If no active session,
it's user-initiated and is skipped as before.

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 85df9ac94bc7
Add tree_hash field to committed checkpoint metadata. Records the git
tree hash of the commit being condensed, enabling fallback checkpoint
lookup by tree hash when the Entire-Checkpoint trailer is stripped by
git history rewrites (rebase, filter-branch, amend).

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 77773a25069e
- Add debug logging to hasActiveSessionInWorktree error paths
- Remove unrelated files (greetings.md, agent configs) from PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: bacd9b68b1c0
Define content-based linkage signals (tree_hash, patch_id,
files_changed_hash, session_files_hash) for re-linking checkpoints
after git history rewrites. Stored at checkpoint level, not per-session.

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 4661e8c50610
Needed by linkage signal tests that verify patch ID stability across rebase.

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 4129c08c80b0
ComputePatchID: git patch-id of the commit diff, survives rebase.
ComputeFilesChangedHash: SHA256 of sorted file:blob pairs, survives
rebase even with conflicts in non-agent files. Uses single git ls-tree
call for all files (O(1) subprocess).

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c100b592e3a0
Replace per-session TreeHash with checkpoint-level LinkageMetadata
containing tree_hash, patch_id, files_changed_hash, and session_files_hash.
Computed in PostCommit handlers, passed through condenseOpts to
CondenseSession, written to CheckpointSummary on entire/checkpoints/v1.

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 75ce05cfe11b
Verify LinkageMetadata is stored in CheckpointSummary and readable.
Also verify nil linkage is omitted (backward compat with old checkpoints).

Part of fix for #834.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 10dae87903d7
gofmt stripped nolint directives from capabilities.go. Restore from main.
Add encoding/hex import for ComputeFilesChangedHash.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c26d3dce1d32
- Restore nolint:ireturn on capabilities.go (gofmt stripped them)
- Set user.name/email in gitops initTestRepo for CI compatibility
  (git rebase fails without repo-level config on CI runners)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: f1ca53b63c79
- Add omitempty to all LinkageMetadata JSON tags for consistency
- Return error for malformed git ls-tree lines instead of silent skip
- Compute commit-level linkage once (not per-session) via baseLinkage
  cache; only SessionFilesHash varies per session
- Add code comment explaining deferred condensation for agent reverts
- Add integration test verifying full linkage pipeline (PostCommit →
  condensation → ReadCommitted with all four signals populated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: e3539c5bfa31
- Replace unreachable Fields/len guard with strings.Cut in ComputePatchID
- Use logCtx variable in linkageForSession for logging consistency
- Use strings.TrimSpace in revParse test helper instead of raw byte slice

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 91b06c4aabdb
@peyton-alt peyton-alt force-pushed the peyton/ent-834-tree-hash-checkpoint-linkage branch from 0dfcba0 to 703bd67 Compare April 7, 2026 23:55
@peyton-alt peyton-alt marked this pull request as ready for review April 8, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants