Skip to content

Conversation

alphaprinz
Copy link
Contributor

@alphaprinz alphaprinz commented Oct 3, 2025

Describe the Problem

Customer has 1 host in BS.
This host reaches less than 20% free capacity, so we issue a LOW_CAPACITY state.
CU adds a host to BS, system is writable and has a lot of free space.
BS state is not MANY_STORAGE_ISSUES, despite being functional and having lots of space on new host.

Explain the Changes

  1. Don't count hosts with low_capacity when calculating MANY_STORAGE_ISSUES.
    (Low capacity state is handled by "free_ratio" and "free" checks)

Issues: Fixed #xxx / Gap #xxx

  1. https://issues.redhat.com/browse/DFBUGS-4152

Testing Instructions:

  1. Create BS with one host.
  2. Let host reach LOW_CAPACITY status (manually upload or by altering nodes_monitor._filter_hosts() result)
  3. Add new host to BS (ie scale out the BS).
  4. BS status after scale out should be OPTIMAL.
  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • Bug Fixes

    • Improved storage health calculation so low-capacity storage is not treated as an issue, reducing false alerts and improving status stability.
  • Documentation

    • Added a clarifying note explaining how low-capacity storage is accounted for in overall free-space checks.

Copy link

coderabbitai bot commented Oct 3, 2025

Walkthrough

Adjusts storage issue calculation in src/server/system_services/pool_server.js to exclude low-capacity storage from issue counts by introducing storage_low_capacity from storage_by_mode.LOW_CAPACITY and subtracting it in storage_issues_ratio. Adds a clarifying comment; no other control flow changes.

Changes

Cohort / File(s) Summary
Pool server storage issue computation
`src/server/system_services/pool_server.js`
Added storage_low_capacity from storage_by_mode.LOW_CAPACITY (default 0). Updated storage_issues_ratio to subtract low-capacity from numerator. Inserted comment explaining low-capacity handling via free_ratio. No other logic or flow changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly states the main change, that the pool server will no longer issue MANY_STORAGE_ISSUES after scaling up, and is concise and specific to the change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de248c1 and eeef74a.

📒 Files selected for processing (1)
  • src/server/system_services/pool_server.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/server/system_services/pool_server.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2edf186 and de248c1.

📒 Files selected for processing (1)
  • src/server/system_services/pool_server.js (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
PR: noobaa/noobaa-core#9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.

Applied to files:

  • src/server/system_services/pool_server.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (2)
src/server/system_services/pool_server.js (2)

1193-1193: LGTM!

The variable introduction follows the established pattern and properly defaults to 0 when storage_by_mode.LOW_CAPACITY is undefined.


1195-1197: Approve the logic change.

The change correctly excludes LOW_CAPACITY hosts from the storage issues ratio calculation, which aligns with the PR objective. Since LOW_CAPACITY is handled separately by the free_ratio and free checks (lines 1215, 1225-1226), this prevents the bucket-store from incorrectly entering MANY_STORAGE_ISSUES state after scaling up.

The clarifying comment appropriately explains the rationale.

const storage_issues_ratio = ((storage_count - storage_optimal) / storage_count) * 100;
//don't count individual storage with low capacity as having issues.
//low capacity is handled for the entire BS by free_ratio check below
const storage_issues_ratio = ((storage_count - storage_optimal - storage_low_capacity) / storage_count) * 100;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify handling of zero storage_count.

If storage_count is zero, the calculation (storage_count - storage_optimal - storage_low_capacity) / storage_count results in NaN. This would cause the storage issue ratio checks (lines 1218, 1220-1221) to fail silently since NaN >= 90 evaluates to false.

Confirm whether storage_count can be zero when host_count > 0 (e.g., when only S3 gateway nodes are present). If so, consider adding a guard condition or defaulting storage_issues_ratio to 0 when storage_count === 0.

Note: Line 1202 has similar logic for s3_issues_ratio, suggesting this may be a pre-existing pattern.

Run the following script to check if there are any guards or validations for zero counts elsewhere in the codebase:


🏁 Script executed:

#!/bin/bash
# Description: Search for similar ratio calculations and guards for zero denominators

# Search for ratio calculations with potential division by zero
rg -n -C3 'ratio.*=.*\/.*count' src/server/system_services/pool_server.js

# Search for guards checking if count is zero before division
rg -n -C3 'count.*===.*0|count.*>.*0.*ratio' src/server/system_services/pool_server.js

Length of output: 1635


🏁 Script executed:

#!/bin/bash
# Search for storage_issues_ratio definition and nearby guards
rg -n -C3 'storage_issues_ratio' src/server/system_services/pool_server.js

# Search for any if-statements checking storage_count before use
rg -n -C3 'if.*storage_count' src/server/system_services/pool_server.js

# Search for s3_issues_ratio definition and nearby guards
rg -n -C3 's3_issues_ratio' src/server/system_services/pool_server.js

# Search for any if-statements checking s3_issues_ratio or its denominator
rg -n -C3 'if.*s3_issues_ratio' src/server/system_services/pool_server.js

Length of output: 2783


Guard against zero denominators when computing storage_issues_ratio and s3_issues_ratio. Default each ratio to 0 if its count is 0 (e.g. replace lines 1197 and 1202 with count > 0 ? (…) : 0) to avoid NaN.

🤖 Prompt for AI Agents
In src/server/system_services/pool_server.js around lines 1197 and 1202, the
calculations for storage_issues_ratio and s3_issues_ratio can divide by zero
producing NaN; update each assignment to check the denominator count
(storage_count and s3_count) and return 0 when the count is 0, otherwise compute
the percentage as before (i.e., use a ternary like count > 0 ? (numerator/
count) * 100 : 0) so the ratios default to 0 when their counts are zero.

@alphaprinz alphaprinz force-pushed the storage_issue_entire_bs_4152 branch from de248c1 to eeef74a Compare October 6, 2025 16:43
const storage_count = hosts.by_service.STORAGE;
const storage_offline = storage_by_mode.OFFLINE || 0;
const storage_optimal = storage_by_mode.OPTIMAL || 0;
const storage_low_capacity = storage_by_mode.LOW_CAPACITY || 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about NO_CAPACITY mode ?
Worth going over all other modes (enum here) and see if they can also be excluded from MANY_STORAGE_ISSUES.
Maybe anything that can be a result of a user operation (e.g.DELETING) or a temp thing (INITIALIZING) can also be ignored. @alphaprinz @nimrod-becker WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to go with a no.
-Temp states will go away and no one will care about them.
-User actions affecting the state makes sense.
-NO_CAPACITY is tricky to ignore because if some other issues incapacitates other hosts, you're left with nothing (as opposed to LOW_CAPACITY which would still allow you some operational uptime).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants