Skip to content

LIVY-246 Support multiple Spark home in runtime#318

Open
ayushiagarwal wants to merge 6 commits intocloudera:masterfrom
ayushiagarwal:LIVY-246
Open

LIVY-246 Support multiple Spark home in runtime#318
ayushiagarwal wants to merge 6 commits intocloudera:masterfrom
ayushiagarwal:LIVY-246

Conversation

@ayushiagarwal
Copy link
Copy Markdown

Made code changes for enabling multiple spark versions on Livy for spark batch jobs. User can pass sparkVersion on run time for batch jobs and SPARK_HOME and SPARK_CONF_DIR would be set according to given sparkVersion.

Need to set path for livy.server.spark-home_$version in Livy.conf and those versions which are added would be supported.

Assumption:

  • bin/livy-env.sh will continue to have SPARK_HOME and SPARK_CONF_DIR which will be used as default spark version for batch and interactive

@codecov-io
Copy link
Copy Markdown

codecov-io commented Apr 26, 2017

Codecov Report

Merging #318 into master will decrease coverage by 2.62%.
The diff coverage is 62.5%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #318      +/-   ##
============================================
- Coverage     70.46%   67.84%   -2.63%     
+ Complexity      713      686      -27     
============================================
  Files            96       96              
  Lines          5079     5091      +12     
  Branches        754      756       +2     
============================================
- Hits           3579     3454     -125     
- Misses          995     1156     +161     
+ Partials        505      481      -24
Impacted Files Coverage Δ Complexity Δ
...loudera/livy/server/batch/CreateBatchRequest.scala 100% <100%> (ø) 18 <1> (+1) ⬆️
.../com/cloudera/livy/server/batch/BatchSession.scala 85.18% <100%> (-2%) 11 <0> (-2)
...er/src/main/scala/com/cloudera/livy/LivyConf.scala 87.96% <50%> (-6.44%) 15 <4> (-1)
.../com/cloudera/livy/utils/SparkProcessBuilder.scala 53.84% <50%> (-3.3%) 11 <1> (-1)
...in/scala/com/cloudera/livy/server/LivyServer.scala 2% <0%> (-32.67%) 2% <0%> (-5%)
.../main/scala/com/cloudera/livy/utils/SparkApp.scala 61.53% <0%> (-19.24%) 1% <0%> (ø)
...in/java/com/cloudera/livy/rsc/ContextLauncher.java 67.16% <0%> (-13.44%) 14% <0%> (-5%)
.../com/cloudera/livy/utils/LineBufferedProcess.scala 60% <0%> (-13.34%) 5% <0%> (-2%)
...a/livy/server/interactive/InteractiveSession.scala 54.63% <0%> (-9.27%) 34% <0%> (-8%)
...ore/src/main/scala/com/cloudera/livy/Logging.scala 63.63% <0%> (-9.1%) 0% <0%> (ø)
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7212e3f...da6edff. Read the comment docs.

Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the best to review how well this accomplishes it's purpose, but I've given config related feedback where I can

version.map {version => Option(get(s"livy.server.spark-home_$version"))
.orElse(throw new IllegalArgumentException(
s"Spark version: $version is not supported"))
}.getOrElse(Option(get(s"livy.server.spark-home")).orElse(sys.env.get("SPARK_HOME")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still use SPARK_HOME here as before not livy.server.spark-home directly

Comment thread conf/livy.conf.template Outdated
# livy.server.spark-home_1.5.2=<path>/spark-1.5.2
# livy.server.spark-home_1.6.3=<path>/spark-1.6.3
# livy.server.spark-home_2.0.1=<path>/spark-2.0.1
# livy.server.spark-home_2.1.0=<path>/spark-2.1.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the _ should be a - to match conf style

val builder = new SparkProcessBuilder(livyConf)
val builder = new SparkProcessBuilder(livyConf, request.sparkVersion)
request.sparkVersion.map({ value =>
builder.env("SPARK_CONF_DIR", livyConf.sparkHome(request.sparkVersion)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm still figuring out the ins and outs of Livy, but I'm not quite sure why SPARK_CONF_DIR needs to updated here, where is it used?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajbozarth, if we are not setting up SPARK_CONF_DIR, then if SPARK_CONF_DIR is set on environment already, then the older one gets precedence and hence spark version change might not get effect.
Example on Hortonworks Distribution: in SPARK_CONF_DIR/spark-defaults.conf, then "spark.yarn.jar" is set to "hdfs:///hdp/spark-assembly.1.6.2.jar" and hence if we are not overwriting the SPARK_CONF_DIR, then always this spark-assembly jar will be used for any spark version.

/** Return the location of the spark home directory */
def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
def sparkHome(version: Option[String] = None): Option[String] = {
version.map {version => Option(get(s"livy.server.spark-home_$version"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should abstract out livy.server.spark-home_$version here into a def like the val's above. Something along the lines of:
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)

@ayushiagarwal
Copy link
Copy Markdown
Author

Fixed the changes suggested. Please review

Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, added some follow up comments

def sparkDeployMode(): Option[String] = Option(get(LIVY_SPARK_DEPLOY_MODE)).filterNot(_.isEmpty)

/** Return the spark home version */
def SPARK_HOME_VER(version: String): Option[String] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more along the lines of adding
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)
to the LivyConf object above as a def similar to the val's, then doing Option(get(SPARK_HOME_VER(version))) which you could then do an orElse throw Exception on inside the sparkHome def as before.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I declare def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null) then I cant pass it in def sparkHome(version: Option[String] = None): Option[String] = {

  • version.map {version=>Option(get(SPARK_HOME_VER(version))).orElse(throw ...)} as it is a map and it will not accept null values

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding you correctly, that's where my comment below would fix it. I'll edit the comment below to clarify it.

/** Return the location of the spark home directory */
def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
def sparkHome(version: Option[String] = None): Option[String] = {
version.map {version => SPARK_HOME_VER(version)
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth Apr 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second look it may be better to use a case here, assuming the SPARK_HOME_VER def I suggested above I recommend above something like:

version.match {
   case Some(version) => Option(get(SPARK_HOME_VER(version))).orElse(throw ...)
   case None => Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
}

where the None case is just the old code. This would be easier to read overall.

@ayushiagarwal
Copy link
Copy Markdown
Author

Fixed the suggested changes. Please review

@ayushiagarwal
Copy link
Copy Markdown
Author

Moved the version with rest of the Entry's and also added code to set up SPARK_CONF_DIR. We need to set up SPARK_CONF_DIR otherwise sparksubmit will take the default environment set up SPAR_CONF_DIR - so setting it up according to spark version.

Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. I found one nit and made a comment on the SPARK_CONF_DIR code you added (I understand why it's needed now)

Comment thread conf/livy-env.sh.template Outdated
@@ -1,4 +1,4 @@
#!/usr/bin/env bash
#!/usr/bin/env bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added by accident


val builder = new SparkProcessBuilder(livyConf)
val builder = new SparkProcessBuilder(livyConf, request.sparkVersion)
request.sparkVersion.map({ value =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need maps here, this should work fine as:

val sparkConfDir = livyConf.sparkHome(request.sparkVersion).map(_ + File.separator + "conf")
builder.env("SPARK_CONF_DIR", sparkConfDir)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need map in builder.env as sparkConfDir is optional variable and builder.env declaration needs a string. Agreed we can remove the first map of request.sparkVersion. Will make the changes and deploy.

@ayushiagarwal
Copy link
Copy Markdown
Author

Fixed the suggested changes in SPARK_CONF_DIR. Please review.

Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jerryshao
Copy link
Copy Markdown
Contributor

What's difference compared to #232 ?

@ajbozarth
Copy link
Copy Markdown
Contributor

@ayushiagarwal want to reopen this on apache/incubator-livy and address @jerryshao concern with his similar PR?

@karthikknnatarajan
Copy link
Copy Markdown

@ayushiagarwal @ajbozarth
We are looking for such a feature in Livy. Any ideas around when this will be merged into master? I can help getting this change merged in.

@ajbozarth
Copy link
Copy Markdown
Contributor

If @ayushiagarwal lets this continue to sit idle for another week I think it would be ok to reopen this on apache/incubator-livy yourself @karthikknnatarajan
No promises on whether it gets merged though, @jerryshao has an alternate implementation ( #232 ) that he hasn't moved over either

@karthikknnatarajan
Copy link
Copy Markdown

Sounds good. I will drop a note here before I create a new PR on apache/incubator-livy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants