Global Positioning System Reference
In-Depth Information
MRSimJoin using the Hadoop MapReduce framework. An extensive
performance evaluation of MRSimJoin with synthetic and real-world
geographic data shows that it scales very well when important parameters
like epsilon, data size, and number of nodes increase. Furthermore, we show
that MRSimJoin performs signifi cantly better than an adaptation of the
state-of-the-art MapReduce-based algorithm to answer arbitrary joins.
Our paths for future work include the study of: (1) other similarity-
aware operators, e.g., kNN Join and kDistance Join, for MapReduce-based
systems, (2) indexing techniques that can be exploited to implement
Similarity Join operations, and (3) cloud queries with multiple similarity-
based operators.
References
Apache Hadoop. 2013. http://hadoop.apache.org/.
Blanas, S., J.M. Patel, V. Ercegovac, J. Rao, E.J. Shekita and Y. Tian. 2010. A comparison of join
algorithms for log processing in mapreduce. In ACM SIGMOD '10, USA.
Bohm, C., B. Braunmuller, F. Krebs and H.-P. Kriegel. 2001. Epsilon grid order: an algorithm
for the similarity join on massive high-dimensional data. In ACM SIGMOD '01, USA.
Chaudhuri, S., V. Ganti and R. Kaushik. 2006. A primitive operator for similarity joins in data
cleaning. In ICDE '06, USA.
Chen, S. 2010. Cheetah: a high performance, custom data warehouse on top of mapreduce.
In VLDB '10, Singapore.
Dean, J. and S. Ghemawat. 2004. Mapreduce: simplifi ed data processing on large clusters. In
OSDI '04, USA.
Dittrich, J.-P. and B. Seeger. 2001. Gess: a scalable similarity-join algorithm for mining large
data sets in high dimensional spaces. In ACM SIGKDD '01, USA.
Dohnal, V., C. Gennaro, P. Savino and P. Zezula. 2003a. Similarity join in metric spaces. In
ECIR '03, Italy.
Dohnal, V., C. Gennaro and P. Zezula. 2003b. Similarity join in metric spaces using ed-index.
In DEXA '03, Czech Republic.
GeoNames. 2013. http://www.geonames.org/about.html.
Gravano, L., P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan and D. Srivastava.
2001. Approximate string joins in a database (almost) for free. In VLDB '01, Italy.
Hjaltason, G.R. and H. Samet. 2003. Index-driven similarity search in metric spaces. ACM
Trans. Database Syst. 28(4): 517-580.
Jacox, E.H. and H. Samet. 2008. Metric space similarity joins. ACM Trans. Database Syst.
33(2): 7:1-7:38.
Jiang, D., A.K.H. Tung and G. Chen. 2011. Map-join-reduce: Toward scalable and effi cient data
analysis on large clusters. IEEE Trans. on Knowl. And Data Eng. 23(9): 1299-1311.
Kitsuregawa, M. and Y. Ogawa. 1990. Bucket spreading parallel hash: a new, robust, parallel
hash join method for data skew in the super database computer (sdc). In VLDB '90,
Australia.
Luo, G., J.F. Naughton and C.J. Ellmann. 2002. A non-blocking parallel spatial join algorithm.
In ICDE '02, USA.
Okcan, A. and M. Riedewald. 2011. Processing theta-joins using mapreduce. In ACM SIGMOD
'11, Greece.
Patel, J.M. and D.J. DeWitt. 1996. Partition based spatial-merge join. In ACM SIGMOD '96,
Canada.
Search WWH ::




Custom Search