Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Hadoop first draft

Eran Tamir edited this page Feb 8, 2016 · 4 revisions
  1. Install Hadoop Machine. I used Ambari https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+2.2.0+from+Public+Repositories

  2. There are two relevant files, one for credentials, another for the REST endpoint and settings.

Edit both files to reflect your NooBaa Endpoint and credentials.

  1. install s3fs (https://github.com/s3fs-fuse/s3fs-fuse) and create output folder on noobaa

mkdir /hadoop-out

/usr/local/bin/s3fs hadoop-out /hadoop-out -o passwd_file=passwd -ouse_path_request_style -ourl=http://146.148.44.71/ -osigv2 -o parallel_count=8

  1. Test by creating a bucket and input/output folder. In this example, the bucket name is hadoop, it will read data from input folder and write the output to the output folder.

    cd /usr/local/hadoop

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar wordcount s3n://hadoop/input file://hadoop-out

---------------- TEMP ----

Download hadoop: wget http://www.us.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

tar -xvf hadoop-2.7.2.tar.gz /usr/local mv hadoop-2.7.2 hadoop edit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

  1. set the JAVA_HOME (point to installed java)

  2. update HADOOP_CLASSPATH

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f:$HADOOP_HOME/share/hadoop/tools/lib/* else export HADOOP_CLASSPATH=$f:$HADOOP_HOME/share/hadoop/tools/lib/*

Clone this wiki locally