Retry version-history commits on a busy git index.lock#1631
Conversation
When several dashboard instances share one config-dir git repo (a common HA setup: prod, beta, and dev addons against one /config/esphome), a real external edit is picked up by all of them at once and they collide on .git/index.lock. The loser raised, and repeated collisions on the secondary instances would trip the degraded flag. A fresh (live) index.lock now raises a typed GitIndexLockBusyError from the sync git layer; the async commit() backs off on the event loop and retries the whole commit, so the executor thread isn't held during the wait. Whichever instance wins commits the edit; the others retry, find nothing staged, and no-op cleanly. Stale-lock recovery is unchanged.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1631 +/- ##
=======================================
Coverage 99.54% 99.54%
=======================================
Files 227 227
Lines 17955 17993 +38
=======================================
+ Hits 17874 17912 +38
Misses 81 81
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR improves the robustness of version-history commits when multiple dashboard instances share the same config-dir git repository by treating a live .git/index.lock as a retryable condition rather than a hard commit failure.
Changes:
- Introduce
GitIndexLockBusyErrorto represent freshindex.lockcontention (live concurrent writer). - Update the async
VersionHistoryController.commit()to back off on the event loop and retry commits when the lock is busy. - Add targeted unit tests covering the new error mapping and controller retry behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
esphome_device_builder/controllers/version_history/git_repo.py |
Adds a typed “busy index.lock” error and updates _run_write to map lock contention into that error type. |
esphome_device_builder/controllers/version_history/controller.py |
Adds bounded async retry/backoff around git commits when GitIndexLockBusyError is raised. |
tests/controllers/version_history/test_git_repo.py |
Updates existing lock-related tests and adds new unit tests for _run_write busy/stale/non-lock error handling. |
tests/controllers/version_history/test_controller.py |
Adds async tests to verify the retry loop behavior and failure behavior after exhausting retries. |
|
|
esphbot
left a comment
There was a problem hiding this comment.
No blocking issues found.
Gate the busy-retry on lock freshness: a lock aged past the stale threshold that we could not clear (an adopted repo we won't touch, or a managed lock we failed to unlink) will not free itself, so surface the original error immediately instead of spinning through the bounded backoff. A fresh, vanished, or unresolvable lock stays retryable.
|
|
esphbot
left a comment
There was a problem hiding this comment.
No blocking issues found.
PR Review — Retry version-history commits on a busy git index.lockClean, well-scoped concurrency fix. Merge-ready. Strengths:
Notes:
Checklist
Automated review by Kōan (Claude) |
esphbot
left a comment
There was a problem hiding this comment.
No blocking issues found.
What does this implement/fix?
When several dashboard instances share one config-dir git repo (a common HA setup: the prod, beta, and dev addons all running against one
/config/esphome), a real external edit is detected by all of them at once and they collide on.git/index.lock. The loser raised, and because each collision counted as a commit failure, repeated collisions on the secondary instances would eventually trip thedegradedflag.A fresh (live)
index.locknow raises a typedGitIndexLockBusyErrorfrom the synchronous git layer instead of failing outright. The asynccommit()catches it, backs off on the event loop, and retries the whole commit; the executor thread is not held during the wait. Whichever instance wins the lock commits the edit; the others retry, find nothing staged, and no-op cleanly. Stale-lock recovery (a crashed prior run) is unchanged.This pairs with #1628, which removes the bulk of the contention by only committing on real YAML edits; this PR makes the genuine concurrent-edit case robust.
Related issue or feature (if applicable):
Types of changes
bugfixnew-featureenhancementbreaking-changerefactordocsmaintenancecidependenciesFrontend coordination
Checklist
ruff,codespell, yaml/json/python checks).tests/where applicable.components.index.json/definitions/components/*.jsonhave not been hand-edited (regenerate viascript/sync_components.pyif a sync is needed).docs/ARCHITECTURE.mdand/ordocs/API.md.