Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Change OOM protection logic #1317

Closed

Conversation

DiegoTavares
Copy link
Collaborator

The current logic relies on hardcoded values which are not suitable for large hosts. The new logic takes into account the size of hosts and also tries to be more aggressive with misbehaving frames.

Prevent host from entering an OOM state where oom-killer might start killing important OS processes. The kill logic will kick in one of the following conditions is met:

  • Host has less than OOM_MEMORY_LEFT_THRESHOLD_PERCENT memory available
  • A frame is taking more than OOM_FRAME_OVERBOARD_PERCENT of what it had reserved For frames that are using more than they had reserved but not above the threshold, negotiate expanding the reservations with other frames on the same host

(cherry picked from commit e88a5295f23bd927614de6d5af6a09d496d3e6ac)
Signed-off-by: Diego Tavares [email protected]

The current logic relies on hardcoded values which are not suitable for large hosts. The new logic takes into account the size of hosts and also tries to be more aggressive with misbehaving frames.

Prevent host from entering an OOM state where oom-killer might start killing important OS processes.
The kill logic will kick in one of the following conditions is met:
  - Host has less than OOM_MEMORY_LEFT_THRESHOLD_PERCENT memory available
  - A frame is taking more than OOM_FRAME_OVERBOARD_PERCENT of what it had reserved
For frames that are using more than they had reserved but not above the threshold, negotiate expanding the reservations with other frames on the same host

(cherry picked from commit e88a5295f23bd927614de6d5af6a09d496d3e6ac)
Signed-off-by: Diego Tavares <[email protected]>
Signed-off-by: Diego Tavares <[email protected]>
@DiegoTavares DiegoTavares changed the title Change OOM protection logic Draft: Change OOM protection logic Sep 15, 2023
@DiegoTavares
Copy link
Collaborator Author

Standby for review. Still working on a loose end found while running this on production.

(cherry picked from commit b88f7bcb1ad43f83fb8357576c33483dc2bf4952)
(cherry picked from commit 647e75e2254c7a7ff68c544e438080f412bf04c1)
(cherry picked from commit aea4864ef66aca494fb455a7c103e4a832b63d41)
@DiegoTavares
Copy link
Collaborator Author

Will submit another PR to avoid having to deal with conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant