Add option to download Hadoop from a custom URL #71

nchammas · 2015-12-15T23:27:23Z

As a follow-up to the discussion in #66, perhaps we should add an option to let users download Hadoop from a custom URL.

Command-line option: --hdfs-download-source

Config file option:

modules:
hdfs:
  version: 2.7.1
  download-source: "http://www.apache.org/dyn/closer.lua/hadoop/common/hadoop-{v}/hadoop-{v}.tar.gz?as_json"

{v} will be replaced by the HDFS version passed in to Flintrock, either via the config file or via the command line.

The .../dyn/closer.lua/... value will be Flintrock's internal default, which the user can replace with a specific Apache mirror, or some other source. The only requirements are that the package be downloadable from the cluster without authentication, and that the URL contain the {v} template somewhere.

I would like to deprecate this option as soon as we have a more robust way of downloading Hadoop quickly and reliably, since that's the main motivation for adding this option in.

Related to #66, #69, #84.

The text was updated successfully, but these errors were encountered:

nchammas · 2016-02-20T02:55:13Z

@ericmjonas - To continue our discussion from #84 about an Apache mirror blacklist here, what mirrors have you noticed take forever to serve up Hadoop?

The mirrors I currently have on my list are:

104.45.233.178
74.206.97.82

ericmjonas · 2016-02-20T16:25:43Z

I have to be honest, I haven't actually cataloged them yet -- I normally
get so frustrated that I abort and immediately go to hacking
download_hadoopy.py. I'm sorry :(

On Fri, Feb 19, 2016 at 6:55 PM, Nicholas Chammas [email protected]
wrote:

@ericmjonas https://github.com/ericmjonas - To continue our discussion
from #84 #84 about an
Apache mirror blacklist here, what mirrors have you noticed take forever to
serve up Hadoop?

The mirrors I currently have on my list are:

104.45.233.178

74.206.97.82

—
Reply to this email directly or view it on GitHub
#71 (comment).

nchammas · 2016-02-20T17:59:11Z

OK, so we can say that at least for you, the download source option described here would work really well. Right? 😀

Perhaps we should get that working first, and then later revisit the idea of a mirror blacklist, or some kind of speculative download-retry mechanism for when a download seems like it it's gonna take forever.

What do you say?

ericmjonas · 2016-02-20T18:06:29Z

On Sat, Feb 20, 2016 at 9:59 AM, Nicholas Chammas [email protected]
wrote:

What do you say?

That sounds great!

nchammas · 2016-02-20T18:09:33Z

I agree with you that the out-of-box experience is critical. For the record, I'm hesitant to dive into these other approaches because:

Speculative download-retry: This would be the "ultimate" solution, but it seems like it would cost a lot in terms of complexity. We'd need a way to monitor the download rate, kill it based on some heuristic, and then try a new download. Seems like a lot of work to me for a relatively limited feature.
Mirror blacklist: I'm more open to this, but we have this initial problem of finding and documenting the bad mirrors. I think a prerequisite to making this useful is to implement the work described in Implement a more lightweight display of launch/start/etc. progress #27 so that it's easy for users to identify stragglers during launch and see if it's an Apache mirror that is slowing things down.
Torrent-style download: This is something I looked into a couple of months ago but couldn't get to work. It's another potential "correct" solution. Basically, have Flintrock use multiple mirrors to download Hadoop, getting each piece of the file from a different mirror. We need some library to make this easy though; this would not be something I would want to build for Flintrock.

nchammas · 2016-02-23T22:56:00Z

@ericmjonas - I'm currently focused on getting #77 wrapped up and then tackling #16 for 0.4.0.

If you want to take a crack at a PR for this in the meantime, be my guest. I've updated the issue description with my latest thoughts on how this would work.

BenFradet · 2016-03-24T21:30:10Z

@nchammas @ericmjonas Do you guys mind if I take this?

nchammas · 2016-03-24T21:57:25Z

Fine by me @BenFradet. Have you run into the issues described here btw? Do you agree with the proposed solution?

BenFradet · 2016-03-25T22:04:52Z

Nope, I haven't run into those issues with flintrock per say but I do have my own mirror which I have found more reliable than apache's in the past.

nchammas added the good for new contributors label Dec 16, 2015

nchammas mentioned this issue Feb 19, 2016

increase verbosity of launch / enable debugging of hung launches #84

Closed

nchammas mentioned this issue Mar 27, 2016

Make download-hadoop.py Python 2/3 compatible #100

Merged

This was referenced Apr 6, 2016

Add option to download Spark from a custom URL #101

Closed

Add a new option for an alternate mirror for spark binaries #104

Closed

rmessner mentioned this issue Apr 8, 2016

Add the option to install hdfs from custom source #109

Closed

BenFradet mentioned this issue May 21, 2016

Add option to download Hadoop from a custom URL #118

Merged

nchammas closed this as completed in #118 May 24, 2016

BenFradet mentioned this issue May 24, 2016

Support 'latest' as spark and hdfs version #121

Open

nchammas added good first issue Good issues for new contributors to tackle and removed good for new contributors labels Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to download Hadoop from a custom URL #71

Add option to download Hadoop from a custom URL #71

nchammas commented Dec 15, 2015

nchammas commented Feb 20, 2016

ericmjonas commented Feb 20, 2016

nchammas commented Feb 20, 2016

ericmjonas commented Feb 20, 2016

nchammas commented Feb 20, 2016

nchammas commented Feb 23, 2016

BenFradet commented Mar 24, 2016

nchammas commented Mar 24, 2016

BenFradet commented Mar 25, 2016

Add option to download Hadoop from a custom URL #71

Add option to download Hadoop from a custom URL #71

Comments

nchammas commented Dec 15, 2015

nchammas commented Feb 20, 2016

ericmjonas commented Feb 20, 2016

nchammas commented Feb 20, 2016

ericmjonas commented Feb 20, 2016

nchammas commented Feb 20, 2016

nchammas commented Feb 23, 2016

BenFradet commented Mar 24, 2016

nchammas commented Mar 24, 2016

BenFradet commented Mar 25, 2016