Remove DeletionTimestamp!=nil condition in IsCompletePod function #221
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
本次提交的PR主要是为了解决一个关于Pod管理的问题。在当前的IsCompletePod函数中,存在对DeletionTimestamp的判断,但这种判断方式可能引发一些问题。DeletionTimestamp被更新仅表示当前Pod开始执行退出动作,但并不意味着该Pod中的服务已经释放了显存。如果Pod的退出动作执行时间较长,可能会导致服务出现显存OOM的情况。这个问题在单机多卡的场景下有一定的出现概率,因此需要移除原有的判断方式,以避免潜在的风险。
PR submission: Remove the judgment of DeletionTimestamp in the IsCompletePod function, because the update of DeletionTimestamp only indicates that the current Pod has started the exit action, but does not mean that the service in the Pod has released the GPU memory. If the exit action of the Pod takes a long time to execute, it may cause the service to run out of GPU memory, which has a certain probability of occurrence in the scenario of multiple GPUs on a single machine. Therefore, it is necessary to remove the original judgment method to avoid potential risks.