SPARK

Table of Contents Deploy SPARK on GlusterFS Solution Components Pre-Req SPARK Config Using SPARK on GlusterFS

Deploy SPARK on GlusterFS

This guide will show you how to run SPARK on GlusterFS using the glusterfs-hadoop plugin.

Install Hadoop and configure for Gluster using this guide: Configuring Hadoop For GlusterFS

For each Node in the cluster:

Download and unzip SPARK to /opt/spark-<VERSION>
Edit <SPARK_INSTALL>/conf/master to include the IP address of the SPARK master node
Edit <SPARK_INSTALL>/conf/slaves to include the IP addresses of each slave node in the SPARK cluster
Create or modify <SPARK_INSTALL>/conf/spark-env.sh to include an environment variable pointing to the Hadoop installation setup previously and a pointer to the Hadoop ./lib/ directory for the glusterfs-hadoop plugin JAR :

HADOOP_CONF_DIR=<HADOOP_INSTALL>/etc/hadoop

SPARK_CLASSPATH=<HADOOP_INSTALL>/share/common/lib/*:$SPARK_CLASSPATH

Use the glusterfs URI to access gluster in SPARK:

JavaRDD<String> distFile = sc.textFile("glusterfs:///anyTextFile.txt");