Skip to content

Conversation

tangledbytes
Copy link
Member

@tangledbytes tangledbytes commented Oct 15, 2025

Describe the Problem

Tape team discovered that sometimes, noobaa-cli glacier migrate noobaa-cli glacier restore would hang indefinitely. Upon investigation it was found that it was the migrate process which was hanging but due to acquiring a cluster wide lock, it was blocking the restore command as well.

The migrate process was hanging because it was trying to acquire an EXCLUSIVE lock on the migration lock but one of the NooBaa server process was not releasing the lock. We have a detection mechanism in persistent_logger which must release the lock eventually however this detection was failing. The root cause of the failure was that in an earlier PR of mine (#9183), I missed updating the fh_stat information which is used by a part of this detection mechanism,

Explain the Changes

This PR updates the code to ensure that the fh_stat value is updated correctly.

Issues: Fixed #xxx / Gap #xxx

  1. https://github.ibm.com/Tier2/Test-Issue/issues/159

Testing Instructions:

  1. Set NOOBAA_LOG_LEVEL="all" in config
  2. Start the NooBaa server
  3. Put an object (to ensure that the process is holding the lock on the migration log)
  4. Run rm -rf <LOG_DIR>/migrate.log && touch <LOG_DIR>/migrate.log
  5. Ensure that NooBaa emits active file changed, closing for namespace: migrate
  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • Bug Fixes
    • Correctly initializes log metadata and in-memory size counters when opening active logs, reducing inconsistencies.
    • Improves accuracy of log size tracking and stability during logging sessions.

Signed-off-by: Utkarsh Srivastava <[email protected]>
Copy link

coderabbitai bot commented Oct 15, 2025

Walkthrough

Init now sets fh_stat from the opened file handle’s stat and resets local_size to 0 after opening the active log file.

Changes

Cohort / File(s) Summary
Persistent Logger Initialization
src/util/persistent_logger.js
In init(), after opening the file handle, await fh.stat(fs_context) to set this.fh_stat and reset this.local_size to 0.

Sequence Diagram(s)

sequenceDiagram
  participant App
  participant PersistentLogger
  participant FS as FileSystem

  App->>PersistentLogger: init()
  PersistentLogger->>FS: open(active log file)
  FS-->>PersistentLogger: file handle (fh)
  PersistentLogger->>FS: fh.stat(fs_context)
  FS-->>PersistentLogger: stats (fh_stat)
  Note over PersistentLogger: Set this.fh_stat = stats<br/>Set this.local_size = 0
  PersistentLogger-->>App: init() resolved
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

size/M

Suggested reviewers

  • guymguym
  • alphaprinz

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly identifies that the pull request resolves a migration hang occurring under a race condition, which matches the core issue addressed by the changeset. It is concise, specific to the main bug fix, and clear to team members reviewing history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2367740 and 358f800.

📒 Files selected for processing (1)
  • src/util/persistent_logger.js (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (1)
src/util/persistent_logger.js (1)

62-63: LGTM! Critical fix for lock release detection.

These changes correctly address the hang issue:

  1. Line 62 populates this.fh_stat with the opened file's metadata, enabling the polling mechanism at line 229 to detect when the active file has been replaced (via inode comparison). Without this, the condition if (this.fh_stat && stat.ino !== this.fh_stat.ino) would always be false, preventing lock release.

  2. Line 63 resets this.local_size to 0 for the fresh file handle, which is consistent with the close() method (lines 85-86) and correct since we're starting with a new file.

The fix is minimal, targeted, and directly addresses the root cause described in the PR.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tangledbytes tangledbytes requested review from a team, alphaprinz, dannyzaken, jackyalbo, naveenpaul1, nimrod-becker and romayalon and removed request for a team and romayalon October 15, 2025 11:06
@tangledbytes
Copy link
Member Author

@alphaprinz this bug might be effecting bucket notifications as well
@jackyalbo this bug might be effecting bucket logging as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants