Global Positioning System Reference
In-Depth Information
Algorithm 1: MRSimJoin ( inDir , outDir , numPiv , eps , memT )
Input: inDir (input directory with the records of datasets R and S ),
outDir (output directory), numPiv (number of pivots), eps (epsilon),
memT (memory threshold)
Output: outDir contains all the results of the Similarity Join operation
ܴྤ ሺ௥ǡ௦ሻ ܵ
outDir (output directory), numPiv (number of pivots), eps (epsilon),
memT (memory threshold)
ܴྤ ሺ௥ǡ௦ሻ ܵ
1: intermDir ← outDir + “/intermediate”
2: roundNum ← 0
3: while true do
4: if roundNum = 0 then
5: job_inDir ← inDir
6: else
7: job_inDir ← GetNextIntermPartitionDir ( intermDir )
8: end if
9: if job_inDir = null then
10: break
11: end if
12: pivots ← GeneratePivots ( job_inDir , numPiv )
13: if isBaseRound ( job_inDir ) then
14: MRJob ( Map_base , Reduce_base , Partition_base , Compare_base ,
job_inDir , outDir , pivots , numPiv , eps , memT , roundNum )
15: else
16: MRJob ( Map_windowPair , Reduce_windowPair ,
Partition_windowPair , Compare_windowPair , job_inDir ,
outDir , pivots , numPiv , eps , memT , roundNum )
17: end if
18: roundNum ++
19: if roundNum > 0 then
20: RenameFromIntermToProcessed ( job_inDir )
21: end if
22: end while
end if
pivots ← GeneratePivots(job_inDir, numPiv)
if is BaseRound(job_inDir) then
generated partition, after the MapReduce job fi nishes, the main routine
renames the job input directory to relocate it under the processed directories
(line 20).
Figure 4 shows an example of the rounds that are executed by the
main routine. Each node MR N represents a MapReduce job. This fi gure
also shows the partitions generated by each job. Light gray partitions are
small partitions that are processed running the single-node SJ routine.
Dark gray partitions are partitions that require additional repartitioning. A
sample sequence of rounds can be: MR 1 , MR 2 , MR 3 , MR 4 , MR 5 and MR 6 . The
original input data is always processed in the fi rst round. Since the links
of any partition can be obtained independently, the routine will generate a
correct result independently of the order of rounds.
Search WWH ::




Custom Search