Skip to content

Conversation

igorsol
Copy link

@igorsol igorsol commented Oct 15, 2025

Two things are implemented by this PR:

  • ensure the backup cursor does not leak in case of errors or shutdown
  • ensure that storage engine switched back to original location in case of errors or shutdown. In case when we are going to retry FCBIS attempt this allows new attempt to start with correct storage location

@igorsol igorsol requested a review from Copilot October 15, 2025 21:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Ensures FCBIS cleans up correctly on errors/shutdown by closing the backup cursor and restoring the storage location back to the configured dbpath when needed.

  • Add _restoreStorageLocation helper and ScopeGuard usage to reliably revert storage location on failure paths.
  • Proactively kill any open backup cursor during cancellation to prevent leaks.
  • Minor comment fix.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/mongo/db/repl/initial_syncer_fcb.h Declares _restoreStorageLocation to centralize restoring dbpath on failure.
src/mongo/db/repl/initial_syncer_fcb.cpp Implements storage restoration via ScopeGuard in key callbacks; kills backup cursor during cancellation; fixes a comment typo.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 2272 to 2275
ScopeGuard storageGuard([this, &lock, opCtx = opCtx.get()] {
// Restore storage location back to original dbpath in case of any failure
_restoreStorageLocation(lock, opCtx);
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the _needToSwitchBackToOriginalDBPath variable (set on line 2291) completely and move this ScopeGuard down on the variable's place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is possible. I implemented this in an additional commit

… changes

If first invocation of 'shutdown' occurs while we are switching storage locations
it will wait until '_restoreStorageLocation' switches back to original location.
Thus we guarantee that there will be no storage changes during subsequent invocation of 'shutdown'.
(second 'shutdown' invocation happens when shutdown thread is already created opCtx
and is waiting for initial syncer finish but if initial syncer is changing storage
location it will try to cancel all operations including opCtx of the shutdown thread
thus creating a deadlock)
@igorsol
Copy link
Author

igorsol commented Oct 17, 2025

Added two additional commits:

  • to remove _needToSwitchBackToOriginalDBPath as @ktrushin suggested
  • to avoid deadlock between shutdown thread and storage change activity (more details in the commit's comment)

@igorsol igorsol requested a review from Copilot October 17, 2025 18:10
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/mongo/db/repl/initial_syncer_fcb.cpp:1

  • _switchStorageLocation and _restoreStorageLocation require a GlobalLock (MODE_X), enforced by invariant(opCtx->lockState()->isW()). Here, no Lock::GlobalLock is acquired before calling either function, so this will trip the invariant and/or perform the switch without the required lock. Acquire a GlobalLock before the switch, and also inside the storageGuard before calling _restoreStorageLocation, e.g.: Lock::GlobalLock lk(opCtx.get(), MODE_X); before the unlock/switch, and Lock::GlobalLock lk(opCtx, MODE_X); inside the guard.
/*======

src/mongo/db/repl/initial_syncer_fcb.cpp:2443

  • This callback performs a storage location switch under released _mutex but does not set _inStorageChange = true before unlocking. That undermines the shutdown wait mechanism and can reintroduce the deadlock this flag was intended to prevent. Mirror the pattern used in _switchToDownloadedCallback: set _inStorageChange = true before lock.unlock(), and rely on _restoreStorageLocation (invoked by the storageGuard at scope exit) to clear the flag and notify.
        // Switch storage to a dummy location
        lock.unlock();
        status = _switchStorageLocation(opCtx.get(), _cfgDBPath + "/.initialsync/.dummy");
        lock.lock();

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@ktrushin ktrushin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@igorsol igorsol merged commit ca4d4da into v7.0 Oct 20, 2025
1 check passed
@igorsol igorsol deleted the psmdb-1755-70 branch October 20, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants