Skip to content

feat: add file locking to prevent concurrent state.json corruption#20

Open
don-petry wants to merge 2 commits intooneirosoft:mainfrom
don-petry:feat/file-locking
Open

feat: add file locking to prevent concurrent state.json corruption#20
don-petry wants to merge 2 commits intooneirosoft:mainfrom
don-petry:feat/file-locking

Conversation

@don-petry
Copy link
Copy Markdown
Contributor

@don-petry don-petry commented Mar 31, 2026

Why?

Running multiple dgr processes concurrently (e.g., in editor hooks or parallel terminals) can cause them to read and write state.json simultaneously, leading to silent data corruption or lost updates. A file lock serializes access so only one dgr process modifies state at a time.

Summary

  • Adds a StoreLock guard (src/core/store/lock.rs) that uses atomic create_new file creation to prevent concurrent dgr processes from writing to state.json simultaneously
  • Creates the .dagger directory if it doesn't exist before acquiring the lock, so locking works even on first initialization
  • Lock is acquired before initialize_store to prevent races during initial setup
  • Integrates the lock into StoreSession so it is acquired when a session opens and released on drop
  • Includes unit tests for lock acquisition, mutual exclusion, and cleanup on drop

Relates to #10 item 1 (file locking).

Test plan

  • cargo test passes (new lock tests in lock.rs)
  • Run two dgr commands simultaneously and verify the second reports "another dgr process appears to be running"
  • Verify the lock file is cleaned up after normal process exit
  • Verify manual deletion of .dagger/lock recovers from a stale lock after crash

🤖 Generated with Claude Code

Use an advisory lock file (.dagger/lock) with atomic create_new to
ensure only one dgr process can modify state.json at a time. The lock
is held for the duration of a StoreSession and released on drop.

Relates to oneirosoft#10 item 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 31, 2026 02:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an advisory lock file to prevent multiple concurrent dgr processes from simultaneously mutating the store (notably state.json), reducing the risk of store corruption under parallel invocations.

Changes:

  • Added a StoreLock guard that acquires an exclusive lock via atomic create_new file creation and removes the lock file on drop.
  • Integrated lock acquisition into StoreSession opening/creation flows.
  • Added unit tests for lock acquisition, mutual exclusion, and cleanup; added tempfile as a dev-dependency.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/core/store/lock.rs New lock guard implementation and unit tests.
src/core/store/session.rs Acquires the lock as part of session creation/opening.
src/core/store/mod.rs Exposes the new lock module within store.
src/core/commit.rs Switches session construction to a helper that acquires the lock.
Cargo.toml Adds tempfile for the new unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +22
pub fn acquire(dagger_root: &Path) -> io::Result<Self> {
let lock_path = dagger_root.join("lock");

// Try to create the lock file exclusively.
// O_CREAT | O_EXCL ensures atomic creation — fails if file already exists.
match fs::OpenOptions::new()
.write(true)
.create_new(true)
.open(&lock_path)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StoreLock::acquire assumes dagger_root already exists. That prevents taking the lock before store initialization (e.g., in open_or_initialize) and will fail with NotFound if the .dagger dir is missing. Consider creating the directory (e.g., fs::create_dir_all(dagger_root)?) before attempting create_new so locking can protect initialization too.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest push.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b5373c0. StoreLock::acquire now calls fs::create_dir_all(dagger_root) before attempting to create the lock file, so the lock can be taken even before the store directory is fully initialized.

Comment on lines +58 to +61
let store_initialization = initialize_store(&paths, trunk_branch)?;
let config =
load_config(&paths)?.ok_or_else(|| io::Error::other("dagger config is missing"))?;
let lock = StoreLock::acquire(&paths.root)?;
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In open_or_initialize, the lock is acquired after initialize_store (which may write config.json/state.json/events). Two concurrent dgr processes can still race during initialization and write state concurrently before either takes the lock. If StoreLock::acquire is updated to create_dir_all on the root, consider acquiring the lock before calling initialize_store so initialization and subsequent reads/writes are fully mutually exclusive.

Suggested change
let store_initialization = initialize_store(&paths, trunk_branch)?;
let config =
load_config(&paths)?.ok_or_else(|| io::Error::other("dagger config is missing"))?;
let lock = StoreLock::acquire(&paths.root)?;
let lock = StoreLock::acquire(&paths.root)?;
let store_initialization = initialize_store(&paths, trunk_branch)?;
let config =
load_config(&paths)?.ok_or_else(|| io::Error::other("dagger config is missing"))?;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest push.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b5373c0. Lock acquisition is now moved before initialize_store in open_or_initialize, so two concurrent processes can no longer race on initialization.

Comment on lines 307 to 310
let actions =
restack::plan_after_branch_advance(&state, node.id, &node.branch_name, old_head_oid)?;
let mut session = StoreSession {
repo: context.repo.clone(),
paths: store_paths,
config,
state,
};
let mut session = StoreSession::from_parts(context.repo.clone(), store_paths, config, state)?;
record_branch_divergence_state(&mut session, node.id, BranchDivergenceState::Diverged)?;
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StoreSession::from_parts acquires the store lock only at this point, but load_state and plan_after_branch_advance were already done earlier without holding the lock. That means this code can still read state.json while another dgr process holds the lock (and may be mid-write), and the planned restack actions may be based on stale state. Consider acquiring the lock before reading state / planning (or pass an already-acquired StoreLock into session construction) and reloading state after the lock is held.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest push.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b5373c0. The lock is now acquired in maybe_restack_after_commit_inner before load_state, and the pre-acquired lock is passed into the session via the new StoreSession::from_lock constructor. Note that load_config remains outside the lock because the config is written once during init and is effectively immutable after that—it is never modified by concurrent operations, so reading it without the lock is safe. The mutable state.json is the resource that needs serialization, and that is now fully covered.

- Ensure dagger_root directory exists before creating lock file
- Move lock acquisition before initialize_store in open_or_initialize
- Acquire lock before state read in maybe_restack_after_commit_inner
- Add from_lock constructor to StoreSession for pre-acquired locks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants