Skip to content

Latest commit

 

History

History
48 lines (36 loc) · 1.81 KB

03-setting-up-apachespark-k8s.md

File metadata and controls

48 lines (36 loc) · 1.81 KB

Setting up Apache Spark in Kubernetes

SPARK ON k8s TUTORIAL

Check Youtube Video For Setting up Spark

Setting up Spark Using Helm

  • Go to rbac/spark-rbac.yaml

RBAC is Role based Access Control to define User Access Priviledges. K8s RBAC is Rest based and maps http verbs to the permissions

A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding grants that access cluster-wide

  • Go to helm_values/sparkoperator_values.yaml Read more.

Exploring sparkoperator_values.yaml

  1. Spark createRole and createClusterRole is set true
  2. For now, we didn't enable monitoring using graffana or external service, so metrics & podMonitor is set to false
  3. resources entirely depends system/docker capacity, change it accordingly
resources:
  limits:
    cpu: 2000m
    memory: 8000Mi
  requests:
    cpu: 200m
    memory: 100Mi
  • Execute the Spark Operator helm file
$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator

$ helm install spark-operator spark-operator/spark-operator -n default -f sparkoperator_values.yaml --create-namespace

Spark will create all pods inside spark namespace only

Test 1

  • Test Application by running kubectl apply -f examples/spark/pi.yaml -n default . Check the logs, a pi value will be logged if passed

Test 2

  • Login to minio and choose test-files bucket
  • Upload any temporary file in the bucket
  • Go to the director ../examples/spark
  • Execute the spark wordcount job kubectl apply -f examples/spark/wordcount.yaml -n default Spark should be able to read from minio which works like AWS s3