-
Notifications
You must be signed in to change notification settings - Fork 118
Kubernetes resources created for an application should be deleted when the application finishes #519
Comments
The service should have an owner reference that ties back to the driver pod. So when the driver pod is deleted from the cluster, the service should go down with it. Can you post the result of |
Although the owner-ref and GC will kick in when the pod is deleted, we can actually delete the service earlier if the submission client is waiting and sees that the driver pod has completed. |
Running
I do see the |
The driver pod has to be deleted in order for the dependent objects to be destroyed. Right now we don't delete the pod once the application finishes. It's useful to keep the pod around to allow users to debug and collect logs before completely destroying it, but maybe we should make it an option to attempt to have the driver attempt to delete itself upon completion. |
Yes, it's definitely useful for the driver pod to stay around. Or we could have the submission client delete all Kubernetes resources it created for the application to run upon application completion. This guarantees that those resources get cleaned up regardless of if the driver pod is deleted or not. |
The submission client can have "fire and forget" semantics though which means the submission client doesn't have to remain running after the driver starts running. In that case only the driver pod can be responsible for managing its Kubernetes resources. |
Then the best we can do is probably make it an option for the driver pod to delete itself upon stopping. The default grace period 30 seconds should be enough for the driver to clean up and terminate. |
I think there is always value in keeping the driver pod around for logs and having its lifetime be controlled by a user. I'd rather we have a "not-fire-and-forget" mode in which the submission client cleans up everything except the driver. I thought |
Yes, for non-fire-and-forget, we are covered if the submission client deletes the resources. The problem is in the fire-and-forget case, users likely would be surprised to discover resource leakage if the driver pod is still around, and some resources such as the secret for small files and the headless service are supposed to be internal. |
@foxish made a good point, the driver will need the right rbac roles to be able to delete the resources. It would make much more sense for the submission client to clean things up as it already has the permissions to do so. |
Uh oh!
There was an error while loading. Please reload this page.
#483 started to create a headless service for the driver pod. However, this service stays around after an application finished and must be deleted manually. I think the submission client should instead be responsible for deleting the service automatically when
spark.kubernetes.submission.waitAppCompletion
is true. This also applies to other Kubernetes resources, such as the secret for small files shipped viaspark.files
.For example, running
kubectl get services
gave the following output, although the applicationpubsub-wordcount
has finished a day before.The text was updated successfully, but these errors were encountered: