Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Kubernetes resources created for an application should be deleted when the application finishes #519

Open
liyinan926 opened this issue Oct 5, 2017 · 10 comments

Comments

@liyinan926
Copy link
Member

liyinan926 commented Oct 5, 2017

#483 started to create a headless service for the driver pod. However, this service stays around after an application finished and must be deleted manually. I think the submission client should instead be responsible for deleting the service automatically when spark.kubernetes.submission.waitAppCompletion is true. This also applies to other Kubernetes resources, such as the secret for small files shipped via spark.files.

For example, running kubectl get services gave the following output, although the application pubsub-wordcount has finished a day before.

NAME                                        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
kubernetes                                  10.0.0.1     <none>        443/TCP             1d
pubsub-wordcount-1507066935952-driver-svc   None         <none>        7078/TCP,7079/TCP   1d

@mccheah
Copy link

mccheah commented Oct 5, 2017

The service should have an owner reference that ties back to the driver pod. So when the driver pod is deleted from the cluster, the service should go down with it. Can you post the result of kubectl get service <driver-service-name> -n <driver-namespace>?

@foxish
Copy link
Member

foxish commented Oct 5, 2017

Although the owner-ref and GC will kick in when the pod is deleted, we can actually delete the service earlier if the submission client is waiting and sees that the driver pod has completed.

@liyinan926
Copy link
Member Author

liyinan926 commented Oct 5, 2017

Running kubectl get service pubsub-wordcount-1507227999031-driver-svc -o=yaml:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2017-10-05T18:26:40Z
  name: pubsub-wordcount-1507227999031-driver-svc
  namespace: default
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Pod
    name: pubsub-wordcount-1507227999031-driver
    uid: baed290a-a9fa-11e7-b535-08002703730e
  resourceVersion: "175448"
  selfLink: /api/v1/namespaces/default/services/pubsub-wordcount-1507227999031-driver-svc
  uid: bb0165fd-a9fa-11e7-b535-08002703730e
spec:
...

I do see the ownerReference though. The driver finished successfully, leaving the driver pod in a Completed status. But this won't cause the gc to kick-in and delete the service, if I under how gc works correctly.

@mccheah
Copy link

mccheah commented Oct 5, 2017

The driver pod has to be deleted in order for the dependent objects to be destroyed. Right now we don't delete the pod once the application finishes. It's useful to keep the pod around to allow users to debug and collect logs before completely destroying it, but maybe we should make it an option to attempt to have the driver attempt to delete itself upon completion.

@liyinan926
Copy link
Member Author

Yes, it's definitely useful for the driver pod to stay around. Or we could have the submission client delete all Kubernetes resources it created for the application to run upon application completion. This guarantees that those resources get cleaned up regardless of if the driver pod is deleted or not.

@mccheah
Copy link

mccheah commented Oct 5, 2017

The submission client can have "fire and forget" semantics though which means the submission client doesn't have to remain running after the driver starts running. In that case only the driver pod can be responsible for managing its Kubernetes resources.

@liyinan926
Copy link
Member Author

Then the best we can do is probably make it an option for the driver pod to delete itself upon stopping. The default grace period 30 seconds should be enough for the driver to clean up and terminate.

@foxish
Copy link
Member

foxish commented Oct 5, 2017

I think there is always value in keeping the driver pod around for logs and having its lifetime be controlled by a user. I'd rather we have a "not-fire-and-forget" mode in which the submission client cleans up everything except the driver. I thought spark.kubernetes.submission.waitAppCompletion is that flag, which can indicate that we want to do cleanup (of everything except the pod) after the job completes.

@liyinan926
Copy link
Member Author

Yes, for non-fire-and-forget, we are covered if the submission client deletes the resources. The problem is in the fire-and-forget case, users likely would be surprised to discover resource leakage if the driver pod is still around, and some resources such as the secret for small files and the headless service are supposed to be internal.

@liyinan926
Copy link
Member Author

@foxish made a good point, the driver will need the right rbac roles to be able to delete the resources. It would make much more sense for the submission client to clean things up as it already has the permissions to do so.

@liyinan926 liyinan926 changed the title Headless service created for the driver should be deleted when the job finishes Kubernetes resources created for an application should be deleted when the application finishes Oct 6, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants