Skip to content

Commit c851304

Browse files
committed
Workingon hands on 4 mapreduce for 6.033
1 parent 240b856 commit c851304

File tree

3 files changed

+114410
-0
lines changed

3 files changed

+114410
-0
lines changed

6.033/hands_on_4_mapreduce.tex

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
\documentclass[psamsfonts]{amsart}
2+
3+
%-------Packages---------
4+
\usepackage{amssymb,amsfonts}
5+
\usepackage{enumerate}
6+
\usepackage[margin=1in]{geometry}
7+
\usepackage{amsthm}
8+
\usepackage{theorem}
9+
\usepackage{verbatim}
10+
\usepackage{tikz}
11+
\usetikzlibrary{shapes,arrows}
12+
13+
\newenvironment{sol}{{\bfseries Solution:}}{\qedsymbol}
14+
\newenvironment{prob}{{\bfseries Problem:}}
15+
16+
\bibliographystyle{plain}
17+
18+
\voffset = -10pt
19+
\headheight = 0pt
20+
\topmargin = -20pt
21+
\textheight = 690pt
22+
23+
%--------Meta Data: Fill in your info------
24+
\title{6.033 \\
25+
Computer Systems Engineering \\
26+
Hands On 4: MapReduce}
27+
28+
\author{John Wang}
29+
30+
\begin{document}
31+
32+
\maketitle
33+
34+
\section{Studying mapreduce.py}
35+
36+
\begin{enumerate}
37+
\item The first two parameters of WordCount are the number of map tasks and reduce tasks that the WordCount job will span, respectively. The number of map tasks splits up the inputs into that many temporary files, and the number of reduce tasks splits up the work into that many different processes.
38+
\item The run() method is called from WordCount's superclass, which is the MapReduce class. The run() method first calls the doMap() method for all of the map tasks, then proceeds to call the doReduce() method for all the reduce tasks. The doMap() method works by taking in the split input and calling the Map() function defined in WordCount on that input, and dumping the results into a file so that the doReduce() method can read them and reduce everything. After all the doMap() jobs are finished, the doReduce() methods begin. These methods take in the files that were created during the doMap() phase, and call Reduce() on the files as defined in the WordCount function.
39+
\item The keyvalue is byte offset from the beginning of the bible file. It serves as a key to the portion of the bible file which the Map() job will be considering. The value is the portion of the bible file which the Map() job is running over.
40+
\item The key represents the word which is found inside of the bible file, and the keyvalues represent a list of tuples. Each tuple in the list inside keyvalues contains the word as the first item, and the number of occurrences of the word as the second item.
41+
42+
\section{Modifying mapreduce.py}
43+
\item There are 4 calls to doMap() and 2 calls to doReduce(). This occurs because these values were the values set in our initialization of the WordCount object to be the number of map and reduce jobs, respectively, that the WordCount object would spawn. Recall that the first two parameters in the WordCount object initialization corresponded to the number of map and reduce jobs to start.
44+
\item All of the doMap() jobs should run in parallel. In addition, the doReduce() jobs should also run in parallel. This is the case because the code that calls these jobs is the following:
45+
\begin{verbatim}
46+
regions = pool.map(self.doMap, range(0, self.maptask))
47+
partitions = pool.map(self.doReduce, range(0, self.reducetask))
48+
\end{verbatim}
49+
50+
These two commands spawn multiple processes to carry out the doMap commands, and also the doReduce commands. They spawn a number of processes equal to the number of map and reduce jobs that were set at the beginning of the WordCount object instantiation.
51+
\item A single doMap() processes about 1208690 bytes of input, but this number changes by a couple bytes because the inputs are split at the first whitespace character that is greater than the chunk size of 1208690.
52+
\item A single doReduce() processes about 2250 keys, but this number is not exact and depends on the number of title words in the associated map jobs for each doReduce() job.
53+
54+
55+
\end{enumerate}
56+
\end{document}

0 commit comments

Comments
 (0)