Global Positioning System Reference
In-Depth Information
the algorithm considers the following cases: (1) when the keys belong to the
same partition and same old partition, they have the same order (lines 21 to
23), (2) when the keys belong to the same partition but different old partition,
they are ordered by their window-pair sequences (lines 24 to 38), (3) when
the keys belong to different partitions but the same old partition, they are
also ordered by their window-pair sequences (lines 39 to 53), and (4) when
the keys belong to different partitions and different old partitions, they have
the same order (line 54). Figure 8b shows the order of partitions generated
by Compare_windowPair for the scenario with two pivots presented in Fig.
7. Figure 8c shows the order of partitions for the case of three pivots.
After generating the groups in a reduce node, the MapReduce
framework calls the reduce function Reduce_windowPair once for each group.
This function is presented in Algorithm 9 and receives as input a key-value
pair ( k 2 , v 2 List ). The goal of this function is to generate the window links
of the partitions that are small enough to be processed in a single node
and to store the data of larger partitions for further repartitioning. If all the
records in v 2 List belong to the same old partition, there is no possibility of
generating window links and thus the function terminates immediately
(lines 1 to 3). If the size of the list is small enough to be processed in a
single node, the algorithm calls the function InMemorySimJoinWin to get
the window links in the current partition (lines 4 to 5). Otherwise, all the
records of the group are stored in an intermediate directory for further
partitioning (lines 6 to 10). Both the intermediate records and the directory
name propagate the information of the previous partitions. This information
will enable the correct generation of window links in subsequent rounds.
Algorithm 9 Reduce_windowPair ()
Input: ( k 2 , v 2 List ) ,W 1 ,W 2 . k 2 = ( kPart , kWin , kPrevPart ), v 2 List = list( id ,
elem , part , prevPart ), W 1 and W 2 are the last two components of the
job input directory name job_inDir
Output: SJ matches or intermediate data. Intermediate data = list( k 3 , v 3).
k 3 = id , v 3 = ( id, elem, part )
1 : if all the elements of v 2 List have the same value of prevPart then
2 : return
3 : end if
4 : if sizeInBytes ( v 2 List ) memT then
5 : InMemorySimJoinWin ( v 2 List, outDir, eps )
6 : else
7 : for each element e of v 2 List do
8 : output ( e.id, ( e.id, e.elem, e.prevPart )) to
outDir /intermediate/W_ ۦ roundNum ۧ _ ۦ k 2 ۧ _ ۦ W 1 ۧ _ ۦ W 2 ۧ
9 : end for
10 : end if
Search WWH ::




Custom Search