Retry version-history commits on a busy git index.lock by bdraco · Pull Request #1631 · esphome/device-builder

bdraco · 2026-06-20T20:40:18Z

What does this implement/fix?

When several dashboard instances share one config-dir git repo (a common HA setup: the prod, beta, and dev addons all running against one /config/esphome), a real external edit is detected by all of them at once and they collide on .git/index.lock. The loser raised, and because each collision counted as a commit failure, repeated collisions on the secondary instances would eventually trip the degraded flag.

A fresh (live) index.lock now raises a typed GitIndexLockBusyError from the synchronous git layer instead of failing outright. The async commit() catches it, backs off on the event loop, and retries the whole commit; the executor thread is not held during the wait. Whichever instance wins the lock commits the edit; the others retry, find nothing staged, and no-op cleanly. Stale-lock recovery (a crashed prior run) is unchanged.

This pairs with #1628, which removes the bulk of the contention by only committing on real YAML edits; this PR makes the genuine concurrent-edit case robust.

Related issue or feature (if applicable):

N/A; reported from production logs.

Types of changes

Bugfix (non-breaking change which fixes an issue) — bugfix
New feature (non-breaking change which adds functionality) — new-feature
Enhancement to an existing feature — enhancement
Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
Refactor (no behaviour change) — refactor
Documentation only — docs
Maintenance / chore — maintenance
CI / workflow change — ci
Dependencies bump — dependencies

Frontend coordination

No frontend change needed
Companion frontend PR: esphome/device-builder-frontend#

Checklist

The code change is tested and works locally.
Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
Tests have been added or updated under tests/ where applicable.
components.index.json / definitions/components/*.json have not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

When several dashboard instances share one config-dir git repo (a common HA setup: prod, beta, and dev addons against one /config/esphome), a real external edit is picked up by all of them at once and they collide on .git/index.lock. The loser raised, and repeated collisions on the secondary instances would trip the degraded flag. A fresh (live) index.lock now raises a typed GitIndexLockBusyError from the sync git layer; the async commit() backs off on the event loop and retries the whole commit, so the executor thread isn't held during the wait. Whichever instance wins commits the edit; the others retry, find nothing staged, and no-op cleanly. Stale-lock recovery is unchanged.

codspeed-hq · 2026-06-20T20:42:36Z

Merging this PR will not alter performance

✅ 27 untouched benchmarks

_{Comparing version-history-retry-fresh-index-lock (b4b36c9) with main (3e78e3d)¹}

No successful run was found on main (9ae683a) during the generation of this report, so 3e78e3d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

codecov · 2026-06-20T20:43:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.54%. Comparing base (f4255cd) to head (b4b36c9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1631   +/-   ##
=======================================
  Coverage   99.54%   99.54%           
=======================================
  Files         227      227           
  Lines       17955    17993   +38     
=======================================
+ Hits        17874    17912   +38     
  Misses         81       81

Flag	Coverage Δ
py3.12	`99.52% <100.00%> (+<0.01%)`	⬆️
py3.14	`99.54% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
..._builder/controllers/version_history/controller.py	`100.00% <100.00%> (ø)`
...ce_builder/controllers/version_history/git_repo.py	`100.00% <100.00%> (ø)`

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR improves the robustness of version-history commits when multiple dashboard instances share the same config-dir git repository by treating a live .git/index.lock as a retryable condition rather than a hard commit failure.

Changes:

Introduce GitIndexLockBusyError to represent fresh index.lock contention (live concurrent writer).
Update the async VersionHistoryController.commit() to back off on the event loop and retry commits when the lock is busy.
Add targeted unit tests covering the new error mapping and controller retry behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`esphome_device_builder/controllers/version_history/git_repo.py`	Adds a typed “busy index.lock” error and updates `_run_write` to map lock contention into that error type.
`esphome_device_builder/controllers/version_history/controller.py`	Adds bounded async retry/backoff around git commits when `GitIndexLockBusyError` is raised.
`tests/controllers/version_history/test_git_repo.py`	Updates existing lock-related tests and adds new unit tests for `_run_write` busy/stale/non-lock error handling.
`tests/controllers/version_history/test_controller.py`	Adds async tests to verify the retry loop behavior and failure behavior after exhausting retries.

esphbot · 2026-06-20T21:02:46Z

~~Previous review~~ — superseded by a newer review below.

esphbot

No blocking issues found.

Gate the busy-retry on lock freshness: a lock aged past the stale threshold that we could not clear (an adopted repo we won't touch, or a managed lock we failed to unlink) will not free itself, so surface the original error immediately instead of spinning through the bounded backoff. A fresh, vanished, or unresolvable lock stays retryable.

esphbot · 2026-06-20T21:29:37Z

~~Previous review~~ — superseded by a newer review below.

esphbot

No blocking issues found.

esphbot · 2026-06-20T21:39:33Z

PR Review — Retry version-history commits on a busy git index.lock

Clean, well-scoped concurrency fix. Merge-ready.

Strengths:

The new _run_write is a bounded while True loop with a cleared_stale one-shot flag: it clears a stale lock at most once, then every path either returns or raises (max 2 iterations) — no infinite-spin risk, verified by the four _run_write unit tests.
Error classification is precise: fresh lock (live writer) → retryable GitIndexLockBusyError; stale-but-unclearable (adopted repo / unlinkable lock) → original CalledProcessError propagates and counts as a real failure. The TOCTOU case (lock vanishes before the freshness stat) correctly degrades to 'fresh → retry resolves it'.
Backoff is awaited on the loop (await asyncio.sleep), never inside _in_executor, so the executor thread isn't pinned during the ~3s bounded wait — matches the PR's stated goal.
Except ordering in commit() is load-bearing and correct: GitIndexLockBusyError (a CalledProcessError subclass) is caught before the GIT_COMMIT_ERRORS catch-all.
Retry idempotency holds: a loser instance re-stages pathspec-scoped and no-ops via git diff --cached --quiet, exactly as described.
Test coverage is thorough across both layers (sync _run_write busy/stale/non-lock/post-clear-recollision, async controller retry-then-succeed and give-up-after-N).

Notes:

The earlier @esphbot review's suggestion Add section config API and user preferences #1 (post-clear in-place retry not mapped to busy) is already resolved in this diff — the loop continue reclassifies a post-clear re-collision as busy, pinned by test_run_write_reclassifies_a_post_clear_recollision_as_busy. No action needed.
Copilot's docstring-style note on GitIndexLockBusyError is incorrect for this repo: the class docstring is single-line, which is the documented default. No change needed.
Minor/by-design: self._lock is held across the up-to-~3s backoff, so subsequent commits on the same instance queue behind a retrying one. This preserves commit ordering and only triggers on genuine cross-instance collisions — intentional, not a defect.

Checklist

Error handling — correct except ordering, no silent swallow
Concurrency — backoff does not hold executor thread
Retry bounds finite (no infinite loop)
Retry idempotency (loser no-ops cleanly)
Stale-unclearable lock still propagates as hard failure
Test coverage for new branches
Docstring style matches repo convention

Automated review by Kōan (Claude) HEAD=b4b36c9 2 min 9s

esphbot

No blocking issues found.

bdraco added the enhancement Improvement to an existing feature label Jun 20, 2026

Drop the unreachable post-loop return in commit retry

20b86b7

bdraco requested a review from Copilot June 20, 2026 20:50

Copilot started reviewing on behalf of bdraco June 20, 2026 20:52 View session

Copilot AI reviewed Jun 20, 2026

View reviewed changes

Comment thread esphome_device_builder/controllers/version_history/git_repo.py Outdated

Comment thread esphome_device_builder/controllers/version_history/git_repo.py Outdated

Dedup the index.lock predicate and tighten comments

18f8028

esphbot approved these changes Jun 20, 2026

View reviewed changes

Clarify busy-lock classification; pin its error type; terse docstring

0880d7d

bdraco marked this pull request as ready for review June 20, 2026 21:17

esphbot approved these changes Jun 20, 2026

View reviewed changes

Reclassify a post-stale-clear lock re-collision as retryable busy

b4b36c9

esphbot approved these changes Jun 20, 2026

View reviewed changes

bdraco merged commit e53f57f into main Jun 20, 2026
21 checks passed

bdraco deleted the version-history-retry-fresh-index-lock branch June 20, 2026 21:42

github-actions Bot locked and limited conversation to collaborators Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry version-history commits on a busy git index.lock#1631

Retry version-history commits on a busy git index.lock#1631
bdraco merged 6 commits into
mainfrom
version-history-retry-fresh-index-lock

bdraco commented Jun 20, 2026

Uh oh!

codspeed-hq Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

esphbot commented Jun 20, 2026 •

edited

Loading

Uh oh!

esphbot left a comment

Uh oh!

esphbot commented Jun 20, 2026 •

edited

Loading

Uh oh!

esphbot left a comment

Uh oh!

esphbot commented Jun 20, 2026

Uh oh!

esphbot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bdraco commented Jun 20, 2026

What does this implement/fix?

Types of changes

Frontend coordination

Checklist

Uh oh!

codspeed-hq Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

codecov Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

esphbot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esphbot left a comment

Choose a reason for hiding this comment

Uh oh!

esphbot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esphbot left a comment

Choose a reason for hiding this comment

Uh oh!

esphbot commented Jun 20, 2026

PR Review — Retry version-history commits on a busy git index.lock

Checklist

Uh oh!

esphbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented Jun 20, 2026 •

edited

Loading

codecov Bot commented Jun 20, 2026 •

edited

Loading

esphbot commented Jun 20, 2026 •

edited

Loading

esphbot commented Jun 20, 2026 •

edited

Loading