Hadoop Workshop Exercises

Projects

1. Word Count Simple

Count the number of words consisting of \w+ characters.

See WordCountSimpleMainTool class.

2. Word Count with Combiner&Patitiner

Count the number of words consisting of \w+ characters.

Combiner does local aggregation of a map task output.

Partitioner divides into three reduces grouped by

mapper output key string starting with 0-9, A-M or N-Z.

See WordCountWithKeyPrefixPartitionerToolMain class.

3. Char Count Simple with Writable&Combiner

Count the number of characters of each lines and each file.

Example:

Input files hello.txt and goodbye.txt includes contents below.

hello.txt Content:

Hello

World!

goodbye.txt Content:

Good

Bye!

Output should be:

{FILENAME}\t{OFFSET}\t{CHARACTER}\t{COUNT}

Hello.txt 0 H 1

Hello.txt 0 e 1

Hello.txt 0 l 2

...

Hello.txt 6 W 1

Hello.txt 6 o 1

...

goodbye.txt 0 G 1

goodbye.txt 0 o 2

...

goodbye.txt 5 ! 1

Combiner does local aggregation of a map task output.

Writable is used for map output key.

See CharCountSimpleToolMainTest class.

4. First Char Count

Count the number of words grouped by first character of words. Both unique count of the word and total count of word are included in output.

See FirstCharCountToolMain class.

5. User History

See UserHistoryToolMain class.

6. Reduce side join

Count word and join a link with a link on the word if links.txt includes the word.

See ReduceSideJoinToolMain class.

Example:

Input file hello.txt includes content below.

hello.txt Content:

0world Hello, world Hello, Hadoop's Hello, MapReduce

Output should be:

{FIRST_CHARACTER_OF_WORD}\t{UNIQUE_COUNT_OF_WORD_APPEAR}\t{TOTAL_COUNT_OF_WORD_APPEAR}

0 1 1

h 2 4

m 1 1

s 1 1

w 1 1

Dependencies

Cloudera CDH3 version 0.20.2-cdh3u0

Ref. Cloudera CDH3 Maven Repository

Importing to Eclipse

Run command:

mvn eclipse:eclipse

If you have created or checked out the project with eclipse, you only have to refresh the project in your workspace.

Otherwise you have to import the project into your eclipse workspace (From the menu bar, select File >Import >Existing Projects into Workspace).

Ref. Maven - Guide to using Eclipse with Maven 2.x

Building

To make executable jar and shells, you can do:

mvn package

or

mvn -P pseudo package

First one is for standalone mode, second is for pseudo-distributed mode.

Executing

To execute WordCount job with hadoop command, you can do:

hadoop jar target/hadoop-workshop-0.0.1.jar com.knownstylenolife.hadoop.workshop.count.tool.WordCountSimpleToolMain input output

To execute WordCount job with shell, you can do:

sh target/appassembler/bin/wordcount input output

This shell script is generated by Appassembler Maven Plug-In.

Executing with DEBUG log level

To set log level of mapper and reducer tasks DEBUG, add "DEBUG" argument as last argument.

hadoop jar target/hadoop-workshop-0.0.1.jar com.knownstylenolife.hadoop.workshop.count.tool.WordCountSimpleToolMain input output DEBUG

or

sh target/appassembler/bin/wordcount input output DEBUG

License

Apache Software License 2.0.

Author

Toshiaki Toyama (@manboubird)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hadoop Workshop Exercises

Contents

Projects

1. Word Count Simple

2. Word Count with Combiner&Patitiner

3. Char Count Simple with Writable&Combiner

4. First Char Count

5. User History

6. Reduce side join

Dependencies

Importing to Eclipse

Building

Executing

Executing with DEBUG log level

License

Author

About

Uh oh!

Releases

Packages

Languages

manboubird/hadoop-workshop

Folders and files

Latest commit

History

Repository files navigation

Hadoop Workshop Exercises

Contents

Projects

1. Word Count Simple

2. Word Count with Combiner&Patitiner

3. Char Count Simple with Writable&Combiner

4. First Char Count

5. User History

6. Reduce side join

Dependencies

Importing to Eclipse

Building

Executing

Executing with DEBUG log level

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages