Global Positioning System Reference
In-Depth Information
Algorithm 1:
MRSimJoin
(
inDir
,
outDir
,
numPiv
,
eps
,
memT
)
Input:
inDir
(input directory with the records of datasets
R
and
S
),
outDir
(output directory),
numPiv
(number of pivots),
eps
(epsilon),
memT
(memory threshold)
Output:
outDir
contains all the results of the Similarity Join operation
ܴྤ
ഄ
ሺǡ௦ሻ
ܵ
outDir
(output directory),
numPiv
(number of pivots),
eps
(epsilon),
memT
(memory threshold)
ܴྤ
ഄ
ሺǡ௦ሻ
ܵ
1:
intermDir ← outDir
+ “/intermediate”
2:
roundNum ←
0
3:
while
true
do
4:
if
roundNum
= 0
then
5:
job_inDir ← inDir
6:
else
7:
job_inDir ← GetNextIntermPartitionDir
(
intermDir
)
8:
end if
9:
if
job_inDir
= null
then
10:
break
11:
end if
12:
pivots ← GeneratePivots
(
job_inDir
,
numPiv
)
13:
if
isBaseRound
(
job_inDir
)
then
14:
MRJob
(
Map_base
,
Reduce_base
,
Partition_base
,
Compare_base
,
job_inDir
,
outDir
,
pivots
,
numPiv
,
eps
,
memT
,
roundNum
)
15:
else
16:
MRJob
(
Map_windowPair
,
Reduce_windowPair
,
Partition_windowPair
,
Compare_windowPair
,
job_inDir
,
outDir
,
pivots
,
numPiv
,
eps
,
memT
,
roundNum
)
17:
end if
18:
roundNum
++
19:
if
roundNum >
0
then
20:
RenameFromIntermToProcessed
(
job_inDir
)
21:
end if
22:
end while
end if
pivots ← GeneratePivots(job_inDir, numPiv)
if
is BaseRound(job_inDir)
then
generated partition, after the MapReduce job fi nishes, the main routine
renames the job input directory to relocate it under the processed directories
(line 20).
Figure 4 shows an example of the rounds that are executed by the
main routine. Each node
MR
N
represents a MapReduce job. This fi gure
also shows the partitions generated by each job. Light gray partitions are
small partitions that are processed running the single-node SJ routine.
Dark gray partitions are partitions that require additional repartitioning. A
sample sequence of rounds can be:
MR
1
,
MR
2
,
MR
3
,
MR
4
,
MR
5
and
MR
6
. The
original input data is always processed in the fi rst round. Since the links
of any partition can be obtained independently, the routine will generate a
correct result independently of the order of rounds.
Search WWH ::
Custom Search