Skip to content

Latest commit

 

History

History
45 lines (29 loc) · 1.58 KB

README.md

File metadata and controls

45 lines (29 loc) · 1.58 KB

CS522_BigData

Implement some MapReduce algorithms, including Pair, Stripe, and Hybrid for Word Co-Occurence and Relative Frequency problems.

Run MapReduce jobs using Spark.

Reference:

Project 1: Pair - Stripe - Hybrid Approach for WordCount

drawing

Implement some MapReduce algorithms, including Pair, Stripe, and Hybrid for Word Co-Occurence and Relative Frequency problems.

  • In Mapper WordCount
  • Average
  • In Mapper Average
  • Pair Approach
  • Stripe Appoach
  • Hybrid Approach

Prerequisites

Cloudera & Eclipse Setup

Running

  • Run in eclipse or,
  • Run the bash script file

Project 2: Spark (Scala)

drawing

Using Spark, compute mean and standard deviation of the amount of gas consumption in UK

Dataset

Prerequisites

Spark & Scala Setup

Running