Global Positioning System Reference
In-Depth Information
Note that, in the case of Reduce_windowPair , all partitions that are stored
for further processing are set to be repartitioned by a future window-pair
partition round. This is the case because the links generated in a window-
pair round or in any of its partitions should always be window links. In
the scenario represented in Fig. 7, the MapReduce framework calls the
Reduce_windowPair function for each partition of Fig. 8b: Q 0, Q 1, Q 0_ Q 1{1}
and Q 0_ Q 1{2}. Observe that the value of uAttr in the output directory name
is k 2. This component ensures unique directory names. Assuming that the
values of k 2 of Q 0_ Q 1{1} and Q 0_ Q 1{2} belong to their bottom windows,
the values of uAttr are: Q 0: (Q 0 ,-1,-1) , Q 1: (Q 1 ,-1,-1), Q 0_ Q 1{1}: (Q 0 ,Q 1 ,P 0 ),
and Q 0_ Q 1{2}: (Q 0 ,Q 1 ,P 1 ).
Enhancements for Geographical Distance
Since the MRSimJoin solution presented in the MRSimJoin Algorithm
subsection is based on the generalized hyperplane distance, it could be used
with any dataset that lies in a metric space. The solution, however, could
be enhanced in cases where the distance from a record to the hyperplane
between two partitions can be computed exactly (Jacox and Samet 2008).
In the case of the geographical distance geoDist defi ned in the Geographic
Data and Distance Functions subsection (Euclidean distance on a plane
where a Spherical Earth was projected), the exact distance from a record
t to the hyperplane that separates the partitions of two pivots P 0 and P 1 is
given by:
hDist ( t, P 0 , P 1 ) = ( geoDist ( t, P 0 ) 2 − geoDist ( t, P 1 ) 2 ) / (2 × geoDist ( P 0 , P 1 )).
To use this distance, the GHP distance should be replaced by hDist in
line 5 of Map_base and also in line 5 of Map_windowPair .
Implementation in Hadoop
The presented MRSimJoin algorithms are generic enough to be implemented
in any MapReduce framework. This section presents a few additional
guidelines for its implementation on the popular Hadoop MapReduce
framework (Apache Hadoop 2013).
Distribution of atomic parameters . One of the tasks of the MRJob function,
called in the main MRSimJoin routine, is to make sure that the provided
atomic parameters, i.e., outDir , numPiv , eps and memT , are available at every
node that will be used in the MapReduce job. In Hadoop, this can be done
using the job confi guration jobConf object and its methods set and get .
Distribution of pivots . MRJob also sends the list of pivots to every
node that will execute a map task. In Hadoop this can be done using the
Search WWH ::




Custom Search