Global Positioning System Reference
In-Depth Information
Note that, in the case of
Reduce_windowPair
, all partitions that are stored
for further processing are set to be repartitioned by a future window-pair
partition round. This is the case because the links generated in a window-
pair round or in any of its partitions should always be window links. In
the scenario represented in Fig. 7, the MapReduce framework calls the
Reduce_windowPair
function for each partition of Fig. 8b:
Q
0,
Q
1,
Q
0_
Q
1{1}
and
Q
0_
Q
1{2}. Observe that the value of
uAttr
in the output directory name
is
k
2. This component ensures unique directory names. Assuming that the
values of
k
2 of
Q
0_
Q
1{1} and
Q
0_
Q
1{2} belong to their bottom windows,
the values of
uAttr
are:
Q
0: (Q
0
,-1,-1) ,
Q
1: (Q
1
,-1,-1),
Q
0_
Q
1{1}: (Q
0
,Q
1
,P
0
),
and
Q
0_
Q
1{2}: (Q
0
,Q
1
,P
1
).
Enhancements for Geographical Distance
Since the MRSimJoin solution presented in the MRSimJoin Algorithm
subsection is based on the generalized hyperplane distance, it could be used
with any dataset that lies in a metric space. The solution, however, could
be enhanced in cases where the distance from a record to the hyperplane
between two partitions can be computed exactly (Jacox and Samet 2008).
In the case of the geographical distance geoDist defi ned in the Geographic
Data and Distance Functions subsection (Euclidean distance on a plane
where a Spherical Earth was projected), the exact distance from a record
t
to the hyperplane that separates the partitions of two pivots
P
0
and
P
1
is
given by:
hDist
(
t, P
0
, P
1
) = (
geoDist
(
t, P
0
)
2
− geoDist
(
t, P
1
)
2
)
/
(2
× geoDist
(
P
0
, P
1
)).
To use this distance, the GHP distance should be replaced by
hDist
in
line 5 of
Map_base
and also in line 5 of
Map_windowPair
.
Implementation in Hadoop
The presented MRSimJoin algorithms are generic enough to be implemented
in any MapReduce framework. This section presents a few additional
guidelines for its implementation on the popular Hadoop MapReduce
framework (Apache Hadoop 2013).
Distribution of atomic parameters
. One of the tasks of the
MRJob
function,
called in the main MRSimJoin routine, is to make sure that the provided
atomic parameters, i.e.,
outDir
,
numPiv
,
eps
and
memT
, are available at every
node that will be used in the MapReduce job. In Hadoop, this can be done
using the job confi guration
jobConf
object and its methods
set
and
get
.
Distribution of pivots
.
MRJob
also sends the list of pivots to every
node that will execute a
map
task. In Hadoop this can be done using the
Search WWH ::
Custom Search