Skip to content

Add documentation for Zeppelin with Spark on Kubernetes #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions src/jekyll/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ This project was put up for voting in [an SPIP](http://apache-spark-developers-l
in August 2017 and passed. It is in the process of being
upstreamed into the apache/spark repository.


### Contents

* [Running Spark on Kubernetes](./running-on-kubernetes.html)
* [Running Spark in Cloud Environments](./running-on-kubernetes-cloud.html)
* [Contribute](./contribute.html)
* [Running Zeppelin with Spark on Kubernetes](./zeppelin.html)
* [Contribute](./contribute.html)
69 changes: 69 additions & 0 deletions src/jekyll/zeppelin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
layout: global
displayTitle: Apache Zeppelin running with Spark on Kubernetes
title: Apache Zeppelin running with Spark on Kubernetes
description: User Documentation for Apache Zeppelin running with Spark on Kubernetes
---

This page a ongoing effort to describe how to run Apache Zeppelin with Spark on Kubernetes

> At the time being, the needed code is not integrated in the `master` branches of `apache-zeppelin` nor the `apache-spark-on-k8s/spark` repositories.
> You are welcome to already ty it out and send any feedback and question.

First things first, you have to choose the following modes in which you will run Zeppelin with Spark on Kubernetes:

+ The `Kubernetes modes`: Can be `in-cluster` (within a Pod) or `out-cluster` (from outside the Kubernetes cluster).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the proper terminology in k8s world? is "out-cluster" the right term?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question and from the already used/seen in-cluster, I have deduced 'out-cluster`. Happy to change to any other more official terminology.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ The `Spark deployment modes`: Can be `client` or `cluster`.

Only three combinations of these options are supported:

1. `in-cluster` with `spark-client` mode.
2. `in-cluster` with `spark-cluster` mode.
3. `out-cluster` with `spark-cluster` mode.

For now, to be able to test these combinations, you need to build specific branches (see hereafter) or to use third-party Helm charts or Docker images. The needed branches and related PR are listed here:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the meeting today, we want to ensure that these branches merge before we can publish documentation.

cc @felixcheung @erikerlandson @liyinan926 @mccheah


1. In-cluster client mode [see pull request #456](https://github.com/apache-spark-on-k8s/spark/pull/456)
2. Add support to run Spark interpreter on a Kubernetes cluster [see pull request #2637](https://github.com/apache/zeppelin/pull/2637)

## In-Cluster with Spark-Client

![In-Cluster with Spark-Client](/img/zeppelin_in-cluster_spark-client.png "In-Cluster with Spark-Client")

Build a new Zeppelin based on [#456 In-cluster client mode](https://github.com/apache-spark-on-k8s/spark/pull/456).

Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings:

+ `spark.app.name`: Do not set any name, Zeppelin with pick one for you.
+ `spark.master`: k8s://https://kubernetes:443
+ `spark.submit.deployMode`: client
+ `spark.kubernetes.driver.pod.name`: The name of the pod where your Zeppelin instance is running.
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`...

## In-Cluster with Spark-Cluster

![In-Cluster with Spark-Cluster](/img/zeppelin_in-cluster_spark-cluster.png "In-Cluster with Spark-Cluster")

Build a new Zepplin based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeppelin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one doesn't seem to be updated...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done now.


Once done, deploy that new build in a Kubernetes Pod with the following interpreter settings:

+ `spark.app.name`: Do not set any name, Zeppelin with pick one for you.
+ `spark.master`: k8s://https://kubernetes:443
+ `spark.submit.deployMode`: cluster
+ `spark.kubernetes.driver.pod.name`: Do not set this property.
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`...

## Out-Cluster with Spark-Cluster

![Out-Cluster with Spark-Client](/img/zeppelin_out-cluster_spark-cluster.png "Out-Cluster with Spark-Client")

Build a new Spark and their associated docker images based on [#2637 Spark interpreter on a Kubernetes](https://github.com/apache/zeppelin/pull/2637).

Once done, any vanilla Apache Zeppelin deployed in a Kubernetes Pod (your can use a Helm chart for this) will work out-of-the box with the following interpreter settings:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this helm chart for this (use a different image for a newer Zeppelin though)
https://github.com/kubernetes/charts/blob/master/stable/spark/templates/spark-zeppelin-deployment.yaml

shall we link it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a section at the end "how to test" and linked to the chart.


+ `spark.app.name`: Do not set any name, Zeppelin with pick one for you.
+ `spark.master`: k8s://https://ip-address-of-the-kube-api:6443 (port may depend on your setup)
+ `spark.submit.deployMode`: cluster
+ `spark.kubernetes.driver.pod.name`: Do not set this property.
+ Other spark.k8s properties you need to make your spark working (see [Running Spark on Kubernetes](./running-on-kubernetes.html)) such as `spark.kubernetes.initcontainer.docker.image`, `spark.kubernetes.driver.docker.image`, `spark.kubernetes.executor.docker.image`...