Global Positioning System Reference
In-Depth Information
Distributed File System
Final Output
Input/Intermediate Data
Node 1
Node 2
Node 3
Pivots
Pivots
Pivots
Map
Map
Map
Map
...
Shuffle
Reduce
Reduce
Reduce
Reduce
This partition was small
enough to find the links
on a single node
These partitions require
further repartitioning
Fig. 1. An MRSimJoin round.
(2008). The process generates two types of partitions: base partitions and
window-pair partitions . The records closest to a given pivot are contained
in a base partition, while the records in the boundary between two
base partitions are contained in a window-pair partition. Generally, the
window-pair records should be a superset of the records whose distance
to the hyperplane that separates the base partitions is at most ε. Since this
hyperplane does not always explicitly exist in a metric space, it is implicit
and known as a generalized hyperplane (GHP).The distance of a record t to
the GHP between two partitions with pivots P 0 and P 1 cannot always be
computed exactly, so a lower bound is used (Hjaltason and Samet 2003):
gen_hyperplane_dist(t, P 0 , P 1 ) = (dist(t,P 0 ) - dist(t,P 1 ))/2 . This distance can be
replaced with an exact distance if this can be computed.
Processing the window-pair partitions guarantees the identifi cation of
the links between records that belong to different base partitions. A round
that further repartitions, a base partition or the initial input data is referred
to as a base partition round , a round that repartitions a window-pair partition
is referred to as a window-pair partition round . At the logical level, the data
partitioning in MRSimJoin is similar to the one in the Quickjoin algorithm
(Jacox and Samet 2008). The core difference in MRSimJoin, however, is
that the partitioning of the data, the generation of the result links, and the
storage of intermediate results are performed in a fully distributed and
parallel manner.
Consider the restaurant-theatre scenario described at the beginning
of this section when dividing a geographical area into partitions. Figure 2
represents a map where the restaurants and movie theatres are located, and
 
Search WWH ::




Custom Search