Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DeletionTimestamp!=nil condition in IsCompletePod function #221

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zhangbc97
Copy link

本次提交的PR主要是为了解决一个关于Pod管理的问题。在当前的IsCompletePod函数中,存在对DeletionTimestamp的判断,但这种判断方式可能引发一些问题。DeletionTimestamp被更新仅表示当前Pod开始执行退出动作,但并不意味着该Pod中的服务已经释放了显存。如果Pod的退出动作执行时间较长,可能会导致服务出现显存OOM的情况。这个问题在单机多卡的场景下有一定的出现概率,因此需要移除原有的判断方式,以避免潜在的风险。

PR submission: Remove the judgment of DeletionTimestamp in the IsCompletePod function, because the update of DeletionTimestamp only indicates that the current Pod has started the exit action, but does not mean that the service in the Pod has released the GPU memory. If the exit action of the Pod takes a long time to execute, it may cause the service to run out of GPU memory, which has a certain probability of occurrence in the scenario of multiple GPUs on a single machine. Therefore, it is necessary to remove the original judgment method to avoid potential risks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant