Global Positioning System Reference
In-Depth Information
The partition is small enough to be solved in a
single node. Results written to final output in HDFS
(Hadoop Distributed File System ).
The partition will need to be further re -partitioned
in additional MapReduce rounds. Intermediate
data is written to HDFS.
Single-node
Distributed
Input Data
MR
1
P1
P2
P3
MR
2
MR
3
MR
4
P4
P5
P6
P7
P8
P9
P10
P11
P12
MR
5
MR
6
P13
P14
P15
P16
P17
P18
Fig. 4.
Example of MRSimJoin rounds and partitions.
Base Partition Round
A base partition round processes the initial input data or a base partition
previously generated by a base partition round. The goal of a base partition
MapReduce job is to partition its input data and produce: the result links
for partitions that are small enough to be processed in a single node, and
intermediate data for partitions that require further processing. The main
routine sets up each base partition MapReduce job to use
Map_base
and
Reduce_base
as the
map
and
reduce
functions, respectively. Additionally,
the default
partition
function is replaced by
Partition_base
and the default
sortCompare
and
groupCompare
functions are replaced by
Compare_base
.
Map_base
, the map function for the base partition rounds, is presented in
Algorithm 2. The format of the input key-value pairs, i.e.,
k
1
, v1
, is:
k
1 =
id,
v
1 = (
id, elem
), and the format of the intermediate key-value pairs, i.e.,
k
2
, v
2,
is:
k
2 = (
part,win
)
, v
2 = (
id, elem, part
), where
part
identifi es the base partition
of the intermediate record and
win
the window partition of the record. We
use the value -1 when a given parameter is not applicable or will not be
needed in the future. The MapReduce framework divides the job input
data into chunks and creates
map
tasks in multiple nodes to process them.
Each
map
task is called multiple times and each call executes the
Map_base
function for a given record (
id,
(
id, elem
)) of the input data. The
Map_base
function identifi es the closest pivot
p
to
elem
(line 1). The function then
outputs one intermediate key-value pair of the form ((
p,−
1)
,
(
id, elem,−
1))
for the base partition that
elem
belongs to (line 2) and one pair of the form
((
p, i
)
,
(
id, elem, p
)) for each window-pair partition (corresponding to pivots
Search WWH ::
Custom Search