[batch] Use earlyoom to terminate a process using too much memory#1402
Draft
giordano wants to merge 1 commit intoUCL:masterfrom
Draft
[batch] Use earlyoom to terminate a process using too much memory#1402giordano wants to merge 1 commit intoUCL:masterfrom
earlyoom to terminate a process using too much memory#1402giordano wants to merge 1 commit intoUCL:masterfrom
Conversation
Linux's built-in system to manage out-of-memory (OOM) errors is unreliable and non-informative, even if it kicks in (and if often it does too late, when the machine is already unresponsive) there's no easy way to tell that a process on a remote machine was terminated because it was using too much memory. With `earlyoom` we bypass Linux's OOM management and have better control over what's printed to screen in case of error.
giordano
commented
Mar 11, 2025
| # Terminate the `earlyoom` process | ||
| kill "${{{{EARLYOOM_PID}}}}" | ||
| # Print line(s) in the log file referencing terminated processes, if any. | ||
| grep "^sending SIGTERM to process" "${{{{EARLYOOM_LOG}}}}" |
Member
Author
There was a problem hiding this comment.
This catches also SIGKILL:
Suggested change
| grep "^sending SIGTERM to process" "${{{{EARLYOOM_LOG}}}}" | |
| grep "^sending \w+ to process" "${{{{EARLYOOM_LOG}}}}" |
Comment on lines
+252
to
+253
| # Start the `earlyoom` daemon. | ||
| nohup earlyoom -m 95 -s 100 &> "${{{{EARLYOOM_LOG}}}}" & |
Member
Author
There was a problem hiding this comment.
I got the argument -m 5 backward, the argument is the amount of memory available, not used:
Suggested change
| # Start the `earlyoom` daemon. | |
| nohup earlyoom -m 95 -s 100 &> "${{{{EARLYOOM_LOG}}}}" & | |
| # Start the `earlyoom` daemon. Ignore swap usage, terminate process if less than 5% of main memory available. | |
| nohup earlyoom -m 5 -s 100 &> "${{{{EARLYOOM_LOG}}}}" & |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Linux's built-in system to manage out-of-memory (OOM) errors is unreliable and non-informative, even if it kicks in (and if often it does too late, when the machine is already unresponsive) there's no easy way to tell that a process on a remote machine was terminated because it was using too much memory. With
earlyoomwe bypass Linux's OOM management and have better control over what's printed to screen in case of error.Ref: #1333 (comment).
Note: opening as draft because I haven't tested this in production yet.