Similarity Join for Big Geographic Data - Geographical Information Systems: Trends and Technologies

Global Positioning System Reference

In-Depth Information

MRSimJoin using the Hadoop MapReduce framework. An extensive

performance evaluation of MRSimJoin with synthetic and real-world

geographic data shows that it scales very well when important parameters

like epsilon, data size, and number of nodes increase. Furthermore, we show

that MRSimJoin performs signifi cantly better than an adaptation of the

state-of-the-art MapReduce-based algorithm to answer arbitrary joins.

Our paths for future work include the study of: (1) other similarity-

aware operators, e.g., kNN Join and kDistance Join, for MapReduce-based

systems, (2) indexing techniques that can be exploited to implement

Similarity Join operations, and (3) cloud queries with multiple similarity-

based operators.

References

Apache Hadoop. 2013. http://hadoop.apache.org/.

Blanas, S., J.M. Patel, V. Ercegovac, J. Rao, E.J. Shekita and Y. Tian. 2010. A comparison of join

algorithms for log processing in mapreduce. In ACM SIGMOD '10, USA.

Bohm, C., B. Braunmuller, F. Krebs and H.-P. Kriegel. 2001. Epsilon grid order: an algorithm

for the similarity join on massive high-dimensional data. In ACM SIGMOD '01, USA.

Chaudhuri, S., V. Ganti and R. Kaushik. 2006. A primitive operator for similarity joins in data

cleaning. In ICDE '06, USA.

Chen, S. 2010. Cheetah: a high performance, custom data warehouse on top of mapreduce.

In VLDB '10, Singapore.

Dean, J. and S. Ghemawat. 2004. Mapreduce: simplifi ed data processing on large clusters. In

OSDI '04, USA.

Dittrich, J.-P. and B. Seeger. 2001. Gess: a scalable similarity-join algorithm for mining large

data sets in high dimensional spaces. In ACM SIGKDD '01, USA.

Dohnal, V., C. Gennaro, P. Savino and P. Zezula. 2003a. Similarity join in metric spaces. In

ECIR '03, Italy.

Dohnal, V., C. Gennaro and P. Zezula. 2003b. Similarity join in metric spaces using ed-index.

In DEXA '03, Czech Republic.

GeoNames. 2013. http://www.geonames.org/about.html.

Gravano, L., P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan and D. Srivastava.

2001. Approximate string joins in a database (almost) for free. In VLDB '01, Italy.

Hjaltason, G.R. and H. Samet. 2003. Index-driven similarity search in metric spaces. ACM

Trans. Database Syst. 28(4): 517-580.

Jacox, E.H. and H. Samet. 2008. Metric space similarity joins. ACM Trans. Database Syst.

33(2): 7:1-7:38.

Jiang, D., A.K.H. Tung and G. Chen. 2011. Map-join-reduce: Toward scalable and effi cient data

analysis on large clusters. IEEE Trans. on Knowl. And Data Eng. 23(9): 1299-1311.

Kitsuregawa, M. and Y. Ogawa. 1990. Bucket spreading parallel hash: a new, robust, parallel

hash join method for data skew in the super database computer (sdc). In VLDB '90,

Australia.

Luo, G., J.F. Naughton and C.J. Ellmann. 2002. A non-blocking parallel spatial join algorithm.

In ICDE '02, USA.

Okcan, A. and M. Riedewald. 2011. Processing theta-joins using mapreduce. In ACM SIGMOD

'11, Greece.

Patel, J.M. and D.J. DeWitt. 1996. Partition based spatial-merge join. In ACM SIGMOD '96,

Canada.

Geographical Information Systems: Trends and Technologies

Search WWH ::

Custom Search

Home