Skip to content

Retry version-history commits on a busy git index.lock#1631

Merged
bdraco merged 6 commits into
mainfrom
version-history-retry-fresh-index-lock
Jun 20, 2026
Merged

Retry version-history commits on a busy git index.lock#1631
bdraco merged 6 commits into
mainfrom
version-history-retry-fresh-index-lock

Conversation

@bdraco

@bdraco bdraco commented Jun 20, 2026

Copy link
Copy Markdown
Member

What does this implement/fix?

When several dashboard instances share one config-dir git repo (a common HA setup: the prod, beta, and dev addons all running against one /config/esphome), a real external edit is detected by all of them at once and they collide on .git/index.lock. The loser raised, and because each collision counted as a commit failure, repeated collisions on the secondary instances would eventually trip the degraded flag.

A fresh (live) index.lock now raises a typed GitIndexLockBusyError from the synchronous git layer instead of failing outright. The async commit() catches it, backs off on the event loop, and retries the whole commit; the executor thread is not held during the wait. Whichever instance wins the lock commits the edit; the others retry, find nothing staged, and no-op cleanly. Stale-lock recovery (a crashed prior run) is unchanged.

This pairs with #1628, which removes the bulk of the contention by only committing on real YAML edits; this PR makes the genuine concurrent-edit case robust.

Related issue or feature (if applicable):

  • N/A; reported from production logs.

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed
  • Companion frontend PR: esphome/device-builder-frontend#

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.index.json / definitions/components/*.json have not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

When several dashboard instances share one config-dir git repo (a common
HA setup: prod, beta, and dev addons against one /config/esphome), a real
external edit is picked up by all of them at once and they collide on
.git/index.lock. The loser raised, and repeated collisions on the
secondary instances would trip the degraded flag.

A fresh (live) index.lock now raises a typed GitIndexLockBusyError from
the sync git layer; the async commit() backs off on the event loop and
retries the whole commit, so the executor thread isn't held during the
wait. Whichever instance wins commits the edit; the others retry, find
nothing staged, and no-op cleanly. Stale-lock recovery is unchanged.
@bdraco bdraco added the enhancement Improvement to an existing feature label Jun 20, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 20, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 27 untouched benchmarks


Comparing version-history-retry-fresh-index-lock (b4b36c9) with main (3e78e3d)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (9ae683a) during the generation of this report, so 3e78e3d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.54%. Comparing base (f4255cd) to head (b4b36c9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1631   +/-   ##
=======================================
  Coverage   99.54%   99.54%           
=======================================
  Files         227      227           
  Lines       17955    17993   +38     
=======================================
+ Hits        17874    17912   +38     
  Misses         81       81           
Flag Coverage Δ
py3.12 99.52% <100.00%> (+<0.01%) ⬆️
py3.14 99.54% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
..._builder/controllers/version_history/controller.py 100.00% <100.00%> (ø)
...ce_builder/controllers/version_history/git_repo.py 100.00% <100.00%> (ø)

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of version-history commits when multiple dashboard instances share the same config-dir git repository by treating a live .git/index.lock as a retryable condition rather than a hard commit failure.

Changes:

  • Introduce GitIndexLockBusyError to represent fresh index.lock contention (live concurrent writer).
  • Update the async VersionHistoryController.commit() to back off on the event loop and retry commits when the lock is busy.
  • Add targeted unit tests covering the new error mapping and controller retry behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
esphome_device_builder/controllers/version_history/git_repo.py Adds a typed “busy index.lock” error and updates _run_write to map lock contention into that error type.
esphome_device_builder/controllers/version_history/controller.py Adds bounded async retry/backoff around git commits when GitIndexLockBusyError is raised.
tests/controllers/version_history/test_git_repo.py Updates existing lock-related tests and adds new unit tests for _run_write busy/stale/non-lock error handling.
tests/controllers/version_history/test_controller.py Adds async tests to verify the retry loop behavior and failure behavior after exhausting retries.

Comment thread esphome_device_builder/controllers/version_history/git_repo.py Outdated
Comment thread esphome_device_builder/controllers/version_history/git_repo.py Outdated
@esphbot

esphbot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@bdraco bdraco marked this pull request as ready for review June 20, 2026 21:17
Gate the busy-retry on lock freshness: a lock aged past the stale
threshold that we could not clear (an adopted repo we won't touch, or a
managed lock we failed to unlink) will not free itself, so surface the
original error immediately instead of spinning through the bounded
backoff. A fresh, vanished, or unresolvable lock stays retryable.
@esphbot

esphbot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Previous review — superseded by a newer review below.

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@esphbot

esphbot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

PR Review — Retry version-history commits on a busy git index.lock

Clean, well-scoped concurrency fix. Merge-ready.

Strengths:

  • The new _run_write is a bounded while True loop with a cleared_stale one-shot flag: it clears a stale lock at most once, then every path either returns or raises (max 2 iterations) — no infinite-spin risk, verified by the four _run_write unit tests.
  • Error classification is precise: fresh lock (live writer) → retryable GitIndexLockBusyError; stale-but-unclearable (adopted repo / unlinkable lock) → original CalledProcessError propagates and counts as a real failure. The TOCTOU case (lock vanishes before the freshness stat) correctly degrades to 'fresh → retry resolves it'.
  • Backoff is awaited on the loop (await asyncio.sleep), never inside _in_executor, so the executor thread isn't pinned during the ~3s bounded wait — matches the PR's stated goal.
  • Except ordering in commit() is load-bearing and correct: GitIndexLockBusyError (a CalledProcessError subclass) is caught before the GIT_COMMIT_ERRORS catch-all.
  • Retry idempotency holds: a loser instance re-stages pathspec-scoped and no-ops via git diff --cached --quiet, exactly as described.
  • Test coverage is thorough across both layers (sync _run_write busy/stale/non-lock/post-clear-recollision, async controller retry-then-succeed and give-up-after-N).

Notes:

  • The earlier @esphbot review's suggestion Add section config API and user preferences #1 (post-clear in-place retry not mapped to busy) is already resolved in this diff — the loop continue reclassifies a post-clear re-collision as busy, pinned by test_run_write_reclassifies_a_post_clear_recollision_as_busy. No action needed.
  • Copilot's docstring-style note on GitIndexLockBusyError is incorrect for this repo: the class docstring is single-line, which is the documented default. No change needed.
  • Minor/by-design: self._lock is held across the up-to-~3s backoff, so subsequent commits on the same instance queue behind a retrying one. This preserves commit ordering and only triggers on genuine cross-instance collisions — intentional, not a defect.


Checklist

  • Error handling — correct except ordering, no silent swallow
  • Concurrency — backoff does not hold executor thread
  • Retry bounds finite (no infinite loop)
  • Retry idempotency (loser no-ops cleanly)
  • Stale-unclearable lock still propagates as hard failure
  • Test coverage for new branches
  • Docstring style matches repo convention

Automated review by Kōan (Claude) HEAD=b4b36c9 2 min 9s

@esphbot esphbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

@bdraco bdraco merged commit e53f57f into main Jun 20, 2026
21 checks passed
@bdraco bdraco deleted the version-history-retry-fresh-index-lock branch June 20, 2026 21:42
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement Improvement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants