-
Notifications
You must be signed in to change notification settings - Fork 6
Add documentation for Zeppelin with Spark on Kubernetes #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 5 commits
da41245
32804a5
30926f1
6b1369d
ab72eb8
7cc3432
833f6a9
1e4eeab
0b661b0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
layout: global | ||
displayTitle: Apache Zeppelin running with Spark on Kubernetes | ||
title: Apache Zeppelin running with Spark on Kubernetes | ||
description: User Documentation for Apache Zeppelin running with Spark on Kubernetes | ||
--- | ||
|
||
This page a ongoing effort to describe how to run Apache Zeppelin with Spark on Kubernetes | ||
|
||
> At the time being, the needed code is not integrated in the `master` branches of `apache-zeppelin` nor the `apache-spark-on-k8s/spark` repositories. | ||
> You are welcome to already ty it out and send any feedback and question. | ||
|
||
First things first, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes: | ||
|
||
+ The `Kubernetes modes`: Can be `in-cluster` (within a Pod) or `out-cluster` (from outside the Kubernetes cluster). | ||
+ The `Spark deployment modes`: Can be `client` or `cluster`. | ||
|
||
Only three combinations of these options are supported: | ||
|
||
1. `in-cluster` with `spark-client` mode. | ||
2. `in-cluster` with `spark-cluster` mode. | ||
3. `out-cluster` with `spark-cluster` mode. | ||
|
||
For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed in the meeting today, we want to ensure that these branches merge before we can publish documentation. |
||
|
||
1. In-cluster client mode [see pull request #456](https://github.com/apache-spark-on-k8s/spark/pull/456) | ||
2. Add support to run Spark interpreter on a Kubernetes cluster [see pull request #2637](https://github.com/apache/zeppelin/pull/2637) | ||
|
||
## In-Cluster with Spark-Client | ||
|
||
 | ||
|
||
Build a new Zeppelin based on [#456 In-cluster client mode](https://github.com/apache-spark-on-k8s/spark/pull/456). | ||
|
||
Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings: | ||
|
||
+ `spark.app.name`: Do not set any name, Zeppelin with pick one for you. | ||
+ `spark.master`: k8s://https://kubernetes:443 | ||
+ `spark.submit.deployMode`: client | ||
+ `spark.kubernetes.driver.pod.name`: The name of the pod where your Zeppelin instance is running. | ||
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`... | ||
|
||
## In-Cluster with Spark-Cluster | ||
|
||
 | ||
|
||
Build a new Zepplin based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Zeppelin There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this one doesn't seem to be updated...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done now. |
||
|
||
Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings: | ||
|
||
+ `spark.app.name`: Do not set any name, Zeppelin with pick one for you. | ||
+ `spark.master`: k8s://https://kubernetes:443 | ||
+ `spark.submit.deployMode`: cluster | ||
+ `spark.kubernetes.driver.pod.name`: Do not set this property. | ||
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`... | ||
|
||
## Out-Cluster with Spark-Cluster | ||
|
||
 | ||
|
||
Build a new Spark and their associated docker images based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637). | ||
|
||
Once done, any vanilla Apache Zeppelin deployed in a Kubernetes Pod (your can use a Helm chart for this) will work out-of-the box with the following interpreter settings: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this helm chart for this (use a different image for a newer Zeppelin though) shall we link it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a section at the end "how to test" and linked to the chart. |
||
|
||
+ `spark.app.name`: Do not set any name, Zeppelin with pick one for you. | ||
+ `spark.master`: k8s://https://ip-address-of-the-kube-api:6443 (port may depend on your setup) | ||
+ `spark.submit.deployMode`: cluster | ||
+ `spark.kubernetes.driver.pod.name`: Do not set this property. | ||
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the proper terminology in k8s world? is "out-cluster" the right term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question and from the already used/seen
in-cluster
, I have deduced 'out-cluster`. Happy to change to any other more official terminology.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@foxish @erikerlandson ?