Global Positioning System Reference
In-Depth Information
The MapReduce framework partitions the intermediate data generated
by map tasks. This partitioning is performed calling the
Partition_base
function presented in Algorithm 3.
Partition_base
receives an intermediate
key, i.e.,
k
2 = (
part,win
), as input and generates the corresponding partition
number.
C
1−
C
3 are constant prime numbers and
NUMPARTITIONS
is the
maximum number of partitions set by the MapReduce framework. The
partition number for an intermediate key that corresponds to a base partition
is computed using a hash function on
part
(line 2). When the key corresponds
to a window-pair partition, the partition number is computed using a hash
function on
min
(
part,win
) and
max
(
part,win
) (line 6). This last hash function
will generate the same partition number for all intermediate records of a
window-pair partition independently of the specifi c window they belong
to. In the scenario of Fig. 5,
Partition_base
will generate the same partition
number, i.e. (
P
0 ×
C
1) mod
NUMPARTITIONS
, for all the intermediate keys
that correspond to partition
P
0. Similarly, the function will generate the
same partition number, i.e., (
P
0×
C
2+
P
1×
C
3) mod
NUMPARTITIONS
, for
all the intermediate keys that correspond to partition
P
0_
P
1.
Algorithm 3:
Partition_base
()
Input:
k
2.
k
2 = (
part,win
)
Output:
k
2's partition number
1:
if
win
=
−
1
then
// base partition
2:
partition ←
(
part × C
1) mod
NUMPARTITIONS
3:
else
// window-pair partition
4:
minVal ← min
(
part,win
)
5:
maxVal ← max
(
part,win
)
6:
partition ←
(
minVal × C
2 +
maxVal × C
3) mod
NUMPARTITIONS
7:
end if
After identifying the partition numbers of intermediate records, the
shuffl e phase of the MapReduce job sends the intermediate records to
their corresponding reduce nodes. The intermediate records received at
each reduce node are sorted and grouped using the
Compare_base
function
presented in Algorithm 4. The main goal of the
Compare_base
function is
to group the intermediate records that belong to the same partition. The
function establishes the order of partitions shown in Fig. 6a. Base partitions
have a lower order than window-pair partitions. Multiple base partitions
are ordered based on their pivot indices. Multiple window-pair partitions
are ordered based on the two associated pivot indices of each partition.
Compare_base
receives as input two intermediate record keys, i.e.,
k
2
1
,
k
2
2
, and returns 0 (when
k
2
1
and
k
2
2
belong to the same group),
−
1 (when
k
2
1
has lower order than
k
2
2
), or +1 (when
k
2
1
has higher order than
k
2
2
).
The algorithm considers all the possible combinations of the intermediate
Search WWH ::
Custom Search