Global Positioning System Reference
In-Depth Information
The partition is small enough to be solved in a
single node. Results written to final output in HDFS
(Hadoop Distributed File System ).
The partition will need to be further re -partitioned
in additional MapReduce rounds. Intermediate
data is written to HDFS.
Single-node
Distributed
Input Data
MR 1
P1
P2
P3
MR 2
MR 3
MR 4
P4
P5
P6
P7
P8
P9
P10
P11
P12
MR 5
MR 6
P13
P14
P15
P16
P17
P18
Fig. 4. Example of MRSimJoin rounds and partitions.
Base Partition Round
A base partition round processes the initial input data or a base partition
previously generated by a base partition round. The goal of a base partition
MapReduce job is to partition its input data and produce: the result links
for partitions that are small enough to be processed in a single node, and
intermediate data for partitions that require further processing. The main
routine sets up each base partition MapReduce job to use Map_base and
Reduce_base as the map and reduce functions, respectively. Additionally,
the default partition function is replaced by Partition_base and the default
sortCompare and groupCompare functions are replaced by Compare_base .
Map_base , the map function for the base partition rounds, is presented in
Algorithm 2. The format of the input key-value pairs, i.e., k 1 , v1 , is: k 1 = id,
v 1 = ( id, elem ), and the format of the intermediate key-value pairs, i.e., k 2 , v 2,
is: k 2 = ( part,win ) , v 2 = ( id, elem, part ), where part identifi es the base partition
of the intermediate record and win the window partition of the record. We
use the value -1 when a given parameter is not applicable or will not be
needed in the future. The MapReduce framework divides the job input
data into chunks and creates map tasks in multiple nodes to process them.
Each map task is called multiple times and each call executes the Map_base
function for a given record ( id, ( id, elem )) of the input data. The Map_base
function identifi es the closest pivot p to elem (line 1). The function then
outputs one intermediate key-value pair of the form (( p,− 1) , ( id, elem,− 1))
for the base partition that elem belongs to (line 2) and one pair of the form
(( p, i ) , ( id, elem, p )) for each window-pair partition (corresponding to pivots
Search WWH ::




Custom Search