-
Notifications
You must be signed in to change notification settings - Fork 66
PSMDB-1755 ensure FCBIS terminates correctly in case of errors or shutdown #1582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Ensures FCBIS cleans up correctly on errors/shutdown by closing the backup cursor and restoring the storage location back to the configured dbpath when needed.
- Add _restoreStorageLocation helper and ScopeGuard usage to reliably revert storage location on failure paths.
- Proactively kill any open backup cursor during cancellation to prevent leaks.
- Minor comment fix.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
src/mongo/db/repl/initial_syncer_fcb.h | Declares _restoreStorageLocation to centralize restoring dbpath on failure. |
src/mongo/db/repl/initial_syncer_fcb.cpp | Implements storage restoration via ScopeGuard in key callbacks; kills backup cursor during cancellation; fixes a comment typo. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
7394f9e
to
f1bf14e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
ScopeGuard storageGuard([this, &lock, opCtx = opCtx.get()] { | ||
// Restore storage location back to original dbpath in case of any failure | ||
_restoreStorageLocation(lock, opCtx); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the _needToSwitchBackToOriginalDBPath
variable (set on line 2291) completely and move this ScopeGuard
down on the variable's place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is possible. I implemented this in an additional commit
… changes If first invocation of 'shutdown' occurs while we are switching storage locations it will wait until '_restoreStorageLocation' switches back to original location. Thus we guarantee that there will be no storage changes during subsequent invocation of 'shutdown'. (second 'shutdown' invocation happens when shutdown thread is already created opCtx and is waiting for initial syncer finish but if initial syncer is changing storage location it will try to cancel all operations including opCtx of the shutdown thread thus creating a deadlock)
Added two additional commits:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/mongo/db/repl/initial_syncer_fcb.cpp:1
- _switchStorageLocation and _restoreStorageLocation require a GlobalLock (MODE_X), enforced by invariant(opCtx->lockState()->isW()). Here, no Lock::GlobalLock is acquired before calling either function, so this will trip the invariant and/or perform the switch without the required lock. Acquire a GlobalLock before the switch, and also inside the storageGuard before calling _restoreStorageLocation, e.g.: Lock::GlobalLock lk(opCtx.get(), MODE_X); before the unlock/switch, and Lock::GlobalLock lk(opCtx, MODE_X); inside the guard.
/*======
src/mongo/db/repl/initial_syncer_fcb.cpp:2443
- This callback performs a storage location switch under released _mutex but does not set _inStorageChange = true before unlocking. That undermines the shutdown wait mechanism and can reintroduce the deadlock this flag was intended to prevent. Mirror the pattern used in _switchToDownloadedCallback: set _inStorageChange = true before lock.unlock(), and rely on _restoreStorageLocation (invoked by the storageGuard at scope exit) to clear the flag and notify.
// Switch storage to a dummy location
lock.unlock();
status = _switchStorageLocation(opCtx.get(), _cfgDBPath + "/.initialsync/.dummy");
lock.lock();
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Two things are implemented by this PR: