Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Use a headless service to give a hostname to the driver. #483

Merged
merged 8 commits into from
Sep 8, 2017

Conversation

mccheah
Copy link

@mccheah mccheah commented Sep 7, 2017

Required since SPARK-21642 was added upstream. Closes #482.

Required since SPARK-21642 was added upstream.
s" managed via a Kubernetes service.")

val preferredServiceName = s"$kubernetesResourceNamePrefix$DRIVER_SVC_POSTFIX"
val resolvedServiceName = if (preferredServiceName.length <= MAX_SERVICE_NAME_LENGTH) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the name prefix is derived from the app name, but we don't want the app name to be constrained because of service name limits. Nevertheless this still is far from ideal and I would appreciate suggestions on how we can do this better.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the service name isn't nearly as visible to users as pod names are, I don't think naming is all that important here. What you have seems fine to me!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might expose the driver UI using the DNS name now. So, I think it would be good to make sure it's intelligible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a great way to shorten the name given a long app name though. It should be fine to make the name less intelligible in the corner cases where the given app name is long.

@ash211
Copy link

ash211 commented Sep 7, 2017

We should cherry pick SPARK-21642 into this PR to confirm this works with it

@mccheah
Copy link
Author

mccheah commented Sep 7, 2017

I wouldn't consider that a requirement for merging this into branch-2.2-kubernetes. We can double check when we propose to upstream master.

@ash211
Copy link

ash211 commented Sep 7, 2017

If we're doing this change so that this works with that commit, we should test against it at the same time as we make the fix. Otherwise we're not sure this works and we'll have to figure this out again in the future, but with context lost.

What's the harm of cherry picking it in?

@mccheah
Copy link
Author

mccheah commented Sep 7, 2017

It's fine because we're hard-setting spark.driver.host here. SPARK-21642 only changes what the default value of spark.driver.host is. Since we now force spark.driver.host to a particular value, our implementation is now not opinionated towards SPARK-21642.

@mccheah
Copy link
Author

mccheah commented Sep 7, 2017

Speaking of which - I just thought of another way to do this, which is to just have KubernetesClusterSchedulerBackend override what the default value is for spark.driver.host so that it reverts to looking at the IP address of the driver pod. That would remove the need for using a headless service at all.

But I like the idea of having a FQDN associated with every driver pod, and using a FQDN based approach fits the spirit of what SPARK-21642 was trying to accomplish in the first place, anyways. The tradeoffs are worth considering though.

@foxish
Copy link
Member

foxish commented Sep 7, 2017

The upstream JIRA had FQDN to solve an SSL cert issue. Maybe we conditionally create the DNS name IF we have SSL turned on? I think that would be reasonable. In the other cases, spark.driver.host override that you suggested might be okay?

@mccheah
Copy link
Author

mccheah commented Sep 7, 2017

How much overhead is it to create the headless service? I would rather not fork the code paths if possible to reduce the complexity.

.build()

val namespace = submissionSparkConf.get(KUBERNETES_NAMESPACE)
val driverHostname = s"${driverService.getMetadata.getName}.$namespace.svc.cluster.local"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this entire string need to be <= 63 characters?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just each component between the dots

@foxish
Copy link
Member

foxish commented Sep 7, 2017

How much overhead is it to create the headless service? I would rather not fork the code paths if possible to reduce the complexity.

Yeah, it wouldn't be too bad. I think we should just create the service then.
The other good thing is that provided apiserver proxy can understand the DNS name, we get a prettier URL for the driver UI.

@kimoonkim
Copy link
Member

It is possible for kube-dns to be momentarily unstable. (I saw this first-hand a few times on our dev K8s cluster) Then this could affect the executor-to-driver communication and may lead to failure. Having this as an option may make jobs more resilient in case SSL is not used.

@mccheah
Copy link
Author

mccheah commented Sep 7, 2017

Is the instability significant and frequent enough in most clusters to warrant the extra complexity here? Basically I don't think the cases that we would optimize for with a forked approach to use the IP address vs. the headless service, would be worthwhile if the headless service would work for us most of the time. I also foresee needing a service in general to expose the Spark UI in the not so distant future.

@kimoonkim
Copy link
Member

Is the instability significant and frequent enough in most clusters to warrant the extra complexity here?

I don't know the answer. I used to see it fail more often in kube 1.5. And kube 1.6 made it better, it seems. It will really depend on how much resource kube-dns has. Or, maybe it has HA that will mitigate the risk often enough. @foxish Do we know how well kube-dns perform in general?

@foxish
Copy link
Member

foxish commented Sep 8, 2017

@kimoonkim I haven't seen kube-dns perform badly in production clusters. It may occasionally take a bit longer to resolve DNS names, but that's not common. Let's add the headless service anyway, and keep our eyes peeled for issues reported because of it; at that point, we can consider forking the code paths and overriding spark.driver.host like @mccheah said in #483 (comment). We should be good with this fix for now.

@kimoonkim
Copy link
Member

SGTM.

Copy link

@ash211 ash211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking that as a +1 from @kimoonkim and @foxish so will merge this when builds are green.

Thanks @mccheah !

@ifilonenko
Copy link
Member

ifilonenko commented Oct 16, 2017

I think this should be re-examined in reference to #523. As this is affecting some PRS (i.e. #514) from passing despite correctness. @mccheah @foxish @ash211

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019
…k-on-k8s#483)

* Use a headless service to give a hostname to the driver.

Required since SPARK-21642 was added upstream.

* Fix scalastyle.

* Add back import

* Fix conflict properly.

* Fix orchestrator test.
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
…k-on-k8s#483)

* Use a headless service to give a hostname to the driver.

Required since SPARK-21642 was added upstream.

* Fix scalastyle.

* Add back import

* Fix conflict properly.

* Fix orchestrator test.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants