-
Notifications
You must be signed in to change notification settings - Fork 89
pool server - don't issue MANY_STORAGE_ISSUES after scaling up (dfbgus 4152) #9235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdjusts storage issue calculation in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/server/system_services/pool_server.js
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
PR: noobaa/noobaa-core#9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.
Applied to files:
src/server/system_services/pool_server.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (2)
src/server/system_services/pool_server.js (2)
1193-1193
: LGTM!The variable introduction follows the established pattern and properly defaults to 0 when
storage_by_mode.LOW_CAPACITY
is undefined.
1195-1197
: Approve the logic change.The change correctly excludes LOW_CAPACITY hosts from the storage issues ratio calculation, which aligns with the PR objective. Since LOW_CAPACITY is handled separately by the
free_ratio
andfree
checks (lines 1215, 1225-1226), this prevents the bucket-store from incorrectly entering MANY_STORAGE_ISSUES state after scaling up.The clarifying comment appropriately explains the rationale.
const storage_issues_ratio = ((storage_count - storage_optimal) / storage_count) * 100; | ||
//don't count individual storage with low capacity as having issues. | ||
//low capacity is handled for the entire BS by free_ratio check below | ||
const storage_issues_ratio = ((storage_count - storage_optimal - storage_low_capacity) / storage_count) * 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify handling of zero storage_count
.
If storage_count
is zero, the calculation (storage_count - storage_optimal - storage_low_capacity) / storage_count
results in NaN
. This would cause the storage issue ratio checks (lines 1218, 1220-1221) to fail silently since NaN >= 90
evaluates to false
.
Confirm whether storage_count
can be zero when host_count > 0
(e.g., when only S3 gateway nodes are present). If so, consider adding a guard condition or defaulting storage_issues_ratio
to 0 when storage_count === 0
.
Note: Line 1202 has similar logic for s3_issues_ratio
, suggesting this may be a pre-existing pattern.
Run the following script to check if there are any guards or validations for zero counts elsewhere in the codebase:
🏁 Script executed:
#!/bin/bash
# Description: Search for similar ratio calculations and guards for zero denominators
# Search for ratio calculations with potential division by zero
rg -n -C3 'ratio.*=.*\/.*count' src/server/system_services/pool_server.js
# Search for guards checking if count is zero before division
rg -n -C3 'count.*===.*0|count.*>.*0.*ratio' src/server/system_services/pool_server.js
Length of output: 1635
🏁 Script executed:
#!/bin/bash
# Search for storage_issues_ratio definition and nearby guards
rg -n -C3 'storage_issues_ratio' src/server/system_services/pool_server.js
# Search for any if-statements checking storage_count before use
rg -n -C3 'if.*storage_count' src/server/system_services/pool_server.js
# Search for s3_issues_ratio definition and nearby guards
rg -n -C3 's3_issues_ratio' src/server/system_services/pool_server.js
# Search for any if-statements checking s3_issues_ratio or its denominator
rg -n -C3 'if.*s3_issues_ratio' src/server/system_services/pool_server.js
Length of output: 2783
Guard against zero denominators when computing storage_issues_ratio
and s3_issues_ratio
. Default each ratio to 0 if its count is 0 (e.g. replace lines 1197 and 1202 with count > 0 ? (…) : 0
) to avoid NaN.
🤖 Prompt for AI Agents
In src/server/system_services/pool_server.js around lines 1197 and 1202, the
calculations for storage_issues_ratio and s3_issues_ratio can divide by zero
producing NaN; update each assignment to check the denominator count
(storage_count and s3_count) and return 0 when the count is 0, otherwise compute
the percentage as before (i.e., use a ternary like count > 0 ? (numerator/
count) * 100 : 0) so the ratios default to 0 when their counts are zero.
…s 4152) Signed-off-by: Amit Prinz Setter <[email protected]>
de248c1
to
eeef74a
Compare
const storage_count = hosts.by_service.STORAGE; | ||
const storage_offline = storage_by_mode.OFFLINE || 0; | ||
const storage_optimal = storage_by_mode.OPTIMAL || 0; | ||
const storage_low_capacity = storage_by_mode.LOW_CAPACITY || 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about NO_CAPACITY
mode ?
Worth going over all other modes (enum here) and see if they can also be excluded from MANY_STORAGE_ISSUES
.
Maybe anything that can be a result of a user operation (e.g.DELETING
) or a temp thing (INITIALIZING
) can also be ignored. @alphaprinz @nimrod-becker WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to go with a no.
-Temp states will go away and no one will care about them.
-User actions affecting the state makes sense.
-NO_CAPACITY is tricky to ignore because if some other issues incapacitates other hosts, you're left with nothing (as opposed to LOW_CAPACITY which would still allow you some operational uptime).
Describe the Problem
Customer has 1 host in BS.
This host reaches less than 20% free capacity, so we issue a LOW_CAPACITY state.
CU adds a host to BS, system is writable and has a lot of free space.
BS state is not MANY_STORAGE_ISSUES, despite being functional and having lots of space on new host.
Explain the Changes
(Low capacity state is handled by "free_ratio" and "free" checks)
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
Bug Fixes
Documentation