-
Notifications
You must be signed in to change notification settings - Fork 1k
fix(cluster_family): Cancel slot migration from incoming node on OOM #5000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d063d60
116ee72
df4a4c9
dc0005f
196805c
3622fd3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1405,6 +1405,10 @@ bool Service::InvokeCmd(const CommandId* cid, CmdArgList tail_args, | |
} | ||
|
||
if (std::string reason = builder->ConsumeLastError(); !reason.empty()) { | ||
// Set flag if OOM reported | ||
if (reason == kOutOfMemory) { | ||
mkaruza marked this conversation as resolved.
Show resolved
Hide resolved
|
||
cmd_cntx.conn_cntx->is_oom_ = true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we put is_oom_ into command context? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can do refactoring and propagate it? I'm ok to do this in separate PR if it is possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, let's talk to improve this in separate PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't want to push any refactor for a single flow. As I wrote above, let's not rush generic solutions for non generic problems. When the time comes and we need the callers to check for reply builder errors we can discuss. Until then, sayonara as we have more important things to deal with. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mkaruza Could you create a ticket to not forget regarding this refactoring There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guys, valid points on both sides. I agree that is_oom_ is conceptually tied to the command rather than the connection, so placing it in CommandContext makes sense from a design perspective. However, since CommandContext isn’t available during the incoming slot migration, I support the idea of deferring this to a separate PR . I will follow up with Mario to understand the effort and if we want to do this If this is a big change and the proposed refactor is non-trivial . There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I can argue the other way but this is not my point. My point is that we have other more important things to deal with rn than this refactor. I am not the one who steers product direction or what to prioritize but I would ask first before considering this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @BorysTheDev raised a design concern. His suggestion to handle it in a separate PR seems reasonable and doesn't block this one. The point about broader priorities is important, but I don't think it conflicts with acknowledging and tracking design inconsistencies. It’s not about shifting priorities now, but recognizing a spot where the design could be improved. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM |
||
} | ||
VLOG(2) << FailedCommandToString(cid->name(), tail_args, reason); | ||
LOG_EVERY_T(WARNING, 1) << FailedCommandToString(cid->name(), tail_args, reason); | ||
} | ||
|
Uh oh!
There was an error while loading. Please reload this page.