Global Positioning System Reference
In-Depth Information
GeoNames, Eps:2
1200
MRSimJoin
MRThetaJoin
Ideal
1000
800
600
400
200
0
(SF1,2)
(SF2,4)
(SF3,6)
(SF4,8)
(SF5,10)
(Scale Factor, Number of Nodes)
Fig. 13. Increasing Number of Nodes and SF-GeoNames.
GeoNames, SF5, Eps:2
600
10
NumRounds
Execution Time
Polynomial Trendline (Exec. Time)
500
8
400
6
300
4
200
2
100
0
0
15
60
105
150
195
240
285
Number of Pivots
Fig. 14. Increasing Number of Pivots-GeoNames.
Conclusions and Future Work
MapReduce-based systems have become a crucial component to effi ciently
process and analyze the large amounts of geographical data currently
available in many commercial and scientifi c organizations. The Similarity
Join is recognized as one of the most useful data analysis operations and has
been used in many application scenarios. While multiple implementation
techniques have been proposed for the Similarity Join, very little work has
addressed the study of MapReduce-based Similarity Joins for geographical
data. This chapter focuses on the study, design, and implementation
techniques of MRSimJoin, a MapReduce-based Similarity Join algorithm
that can be used with geographical data and distance functions. MRSimJoin
iteratively partitions the data until the partitions are small enough to be
effi ciently processed in a single node. Each iteration executes a MapReduce
job that processes the generated partitions in parallel. We implemented
Search WWH ::




Custom Search