LIVY-246 Support multiple Spark home in runtime#318
LIVY-246 Support multiple Spark home in runtime#318ayushiagarwal wants to merge 6 commits intocloudera:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #318 +/- ##
============================================
- Coverage 70.46% 67.84% -2.63%
+ Complexity 713 686 -27
============================================
Files 96 96
Lines 5079 5091 +12
Branches 754 756 +2
============================================
- Hits 3579 3454 -125
- Misses 995 1156 +161
+ Partials 505 481 -24
Continue to review full report at Codecov.
|
ajbozarth
left a comment
There was a problem hiding this comment.
I'm not the best to review how well this accomplishes it's purpose, but I've given config related feedback where I can
| version.map {version => Option(get(s"livy.server.spark-home_$version")) | ||
| .orElse(throw new IllegalArgumentException( | ||
| s"Spark version: $version is not supported")) | ||
| }.getOrElse(Option(get(s"livy.server.spark-home")).orElse(sys.env.get("SPARK_HOME"))) |
There was a problem hiding this comment.
I think we should still use SPARK_HOME here as before not livy.server.spark-home directly
| # livy.server.spark-home_1.5.2=<path>/spark-1.5.2 | ||
| # livy.server.spark-home_1.6.3=<path>/spark-1.6.3 | ||
| # livy.server.spark-home_2.0.1=<path>/spark-2.0.1 | ||
| # livy.server.spark-home_2.1.0=<path>/spark-2.1.0 |
There was a problem hiding this comment.
the _ should be a - to match conf style
| val builder = new SparkProcessBuilder(livyConf) | ||
| val builder = new SparkProcessBuilder(livyConf, request.sparkVersion) | ||
| request.sparkVersion.map({ value => | ||
| builder.env("SPARK_CONF_DIR", livyConf.sparkHome(request.sparkVersion) |
There was a problem hiding this comment.
So I'm still figuring out the ins and outs of Livy, but I'm not quite sure why SPARK_CONF_DIR needs to updated here, where is it used?
There was a problem hiding this comment.
Hi @ajbozarth, if we are not setting up SPARK_CONF_DIR, then if SPARK_CONF_DIR is set on environment already, then the older one gets precedence and hence spark version change might not get effect.
Example on Hortonworks Distribution: in SPARK_CONF_DIR/spark-defaults.conf, then "spark.yarn.jar" is set to "hdfs:///hdp/spark-assembly.1.6.2.jar" and hence if we are not overwriting the SPARK_CONF_DIR, then always this spark-assembly jar will be used for any spark version.
| /** Return the location of the spark home directory */ | ||
| def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME")) | ||
| def sparkHome(version: Option[String] = None): Option[String] = { | ||
| version.map {version => Option(get(s"livy.server.spark-home_$version")) |
There was a problem hiding this comment.
I think we should abstract out livy.server.spark-home_$version here into a def like the val's above. Something along the lines of:
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)
|
Fixed the changes suggested. Please review |
ajbozarth
left a comment
There was a problem hiding this comment.
Thanks for the updates, added some follow up comments
| def sparkDeployMode(): Option[String] = Option(get(LIVY_SPARK_DEPLOY_MODE)).filterNot(_.isEmpty) | ||
|
|
||
| /** Return the spark home version */ | ||
| def SPARK_HOME_VER(version: String): Option[String] = { |
There was a problem hiding this comment.
I was thinking more along the lines of adding
def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null)
to the LivyConf object above as a def similar to the val's, then doing Option(get(SPARK_HOME_VER(version))) which you could then do an orElse throw Exception on inside the sparkHome def as before.
There was a problem hiding this comment.
If I declare def SPARK_HOME_VER(version: String) = Entry(s"livy.server.spark-home-$version", null) then I cant pass it in def sparkHome(version: Option[String] = None): Option[String] = {
- version.map {version=>Option(get(SPARK_HOME_VER(version))).orElse(throw ...)} as it is a map and it will not accept null values
There was a problem hiding this comment.
If I'm understanding you correctly, that's where my comment below would fix it. I'll edit the comment below to clarify it.
| /** Return the location of the spark home directory */ | ||
| def sparkHome(): Option[String] = Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME")) | ||
| def sparkHome(version: Option[String] = None): Option[String] = { | ||
| version.map {version => SPARK_HOME_VER(version) |
There was a problem hiding this comment.
On a second look it may be better to use a case here, assuming the SPARK_HOME_VER def I suggested above I recommend above something like:
version.match {
case Some(version) => Option(get(SPARK_HOME_VER(version))).orElse(throw ...)
case None => Option(get(SPARK_HOME)).orElse(sys.env.get("SPARK_HOME"))
}
where the None case is just the old code. This would be easier to read overall.
|
Fixed the suggested changes. Please review |
|
Moved the version with rest of the Entry's and also added code to set up SPARK_CONF_DIR. We need to set up SPARK_CONF_DIR otherwise sparksubmit will take the default environment set up SPAR_CONF_DIR - so setting it up according to spark version. |
ajbozarth
left a comment
There was a problem hiding this comment.
Thanks for the fixes. I found one nit and made a comment on the SPARK_CONF_DIR code you added (I understand why it's needed now)
| @@ -1,4 +1,4 @@ | |||
| #!/usr/bin/env bash | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
I think this was added by accident
|
|
||
| val builder = new SparkProcessBuilder(livyConf) | ||
| val builder = new SparkProcessBuilder(livyConf, request.sparkVersion) | ||
| request.sparkVersion.map({ value => |
There was a problem hiding this comment.
You don't need maps here, this should work fine as:
val sparkConfDir = livyConf.sparkHome(request.sparkVersion).map(_ + File.separator + "conf")
builder.env("SPARK_CONF_DIR", sparkConfDir)
There was a problem hiding this comment.
I think we will need map in builder.env as sparkConfDir is optional variable and builder.env declaration needs a string. Agreed we can remove the first map of request.sparkVersion. Will make the changes and deploy.
|
Fixed the suggested changes in SPARK_CONF_DIR. Please review. |
|
What's difference compared to #232 ? |
|
@ayushiagarwal want to reopen this on apache/incubator-livy and address @jerryshao concern with his similar PR? |
|
@ayushiagarwal @ajbozarth |
|
If @ayushiagarwal lets this continue to sit idle for another week I think it would be ok to reopen this on apache/incubator-livy yourself @karthikknnatarajan |
|
Sounds good. I will drop a note here before I create a new PR on apache/incubator-livy. |
Made code changes for enabling multiple spark versions on Livy for spark batch jobs. User can pass sparkVersion on run time for batch jobs and SPARK_HOME and SPARK_CONF_DIR would be set according to given sparkVersion.
Need to set path for livy.server.spark-home_$version in Livy.conf and those versions which are added would be supported.
Assumption: