Global Positioning System Reference
In-Depth Information
join algorithms. A hash-based algorithm to address the case of data skew
is presented by Schneider and DeWitt (1989). The algorithm dynamically
allocates partitions to the processing units with the goal of assigning the
same data volume to each unit.
Also related is the work on spatial join. This operation combines two
datasets based on some spatial relationship between the objects (usually
based on the intersection or containment of objects). Some work focuses
on solving this problem in a single-node database (e.g., Patel and DeWitt
1996). More relevant to this chapter is the work on parallel spatial joins
that propose algorithms to implement this operation on parallel spatial
databases (e.g., Luo et al. 2002; Patel and DeWitt 2000). Some techniques
to solve the spatial joins can be extended to support similarity joins (e.g.,
Luo et al. 2002). A key difference of this work with the one in this chapter,
is that we focus on the similarity join problem on the highly distributed and
highly fault-tolerant MapReduce framework, which is widely considered
as one of the key frameworks for processing big data.
Similarity Joins Using MapReduce
This section presents preliminary information (geographic data and distance
functions, and MapReduce) and then presents MRSimJoin.
Geographic Data and Distance Functions
In this work, we assume that the geographic data we need to process
uses the common geographic coordinates: latitude ( φ ) and longitude ( λ ).
These coordinates allow us to specify any point on the earth using a pair
of numbers. Several geographical distance functions have been defi ned
in terms of latitude and longitude to compute the distance between two
points on earth, e.g., Euclidean distance on a plane where a Spherical or
Ellipsoidal Earth was projected, Great-circle distance, Tunnel distance,
etc. The algorithm presented in this section can be used with any (metric)
distance function. Our presentation, however, considers the case of
Euclidean distance on a plane where a Spherical Earth was projected using
equirectangular projection (Snyder 1993).This distance function is fast
to compute, is reasonably accurate for small distances, and is defi ned as
follows. Given two points r 1 =( φ 1 1 ) and r 2 =( φ 2 2 ), the geographical distance
between r 1 and r 2 , as measured along the surface of the earth, is given by
the following expression:
݃݁݋ܦ݅ݏݐሺݎ ǡݎ ሻൌܴටሺο ൅ሺሺ߮ ሻο ,
Search WWH ::




Custom Search