Databases Reference
In-Depth Information
for that—but we still see a few recurrent patterns. One is the ubiquitous query to find
things within a radius of a point.
The typical use is something like finding apartments for rent within a radius of the
center of a zip code, filtering “matches” on a dating site, and so on. Suppose you have
the following table:
CREATE TABLE locations (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30),
lat FLOAT NOT NULL,
lon FLOAT NOT NULL
);
INSERT INTO locations(name, lat, lon)
VALUES('Charlottesville, Virginia', 38.03, −78.48),
('Chicago, Illinois', 41.85, −87.65),
('Washington, DC', 38.89, −77.04);
The latitude and longitude are in degrees, and the queries usually use the great-circle
(Haversine) formula to find the distance along the surface of the Earth, assuming that
it is a sphere. The formula to find the distance between point A and point B, with
coordinates latA and lonA , latB , and lonB in radians, can be expressed as follows:
ACOS(
COS( latA ) * COS( latB ) * COS( lonA - lonB )
+ SIN( latA ) * SIN( latB )
)
The result is in radians, so to find the distance in miles or kilometers, it needs to be
multiplied by the Earth's radius, which is about 3,959 miles or 6,371 kilometers. Sup-
pose we want to find all points within 100 miles of Charlottesville, where Baron lives.
We need to convert the latitudes and longitudes to radians to plug them into the
formula:
SELECT * FROM locations WHERE 3979 * ACOS(
COS(RADIANS(lat)) * COS(RADIANS(38.03)) * COS(RADIANS(lon) - RADIANS(-78.48))
+ SIN(RADIANS(lat)) * SIN(RADIANS(38.03))
) <= 100;
+----+---------------------------+-------+--------+
| id | name | lat | lon |
+----+---------------------------+-------+--------+
| 1 | Charlottesville, Virginia | 38.03 | −78.48 |
| 3 | Washington, DC | 38.89 | −77.04 |
+----+---------------------------+-------+--------+
This type of query not only can't use an index, but will also burn a ton of CPU cycles
and load the server very heavily. We've seen it many times. What can we do about it?
There are several aspects of this design that can be optimized. The first is to decide
whether the precision is really necessary. There's a lot of inherent imprecision:
• Locations might be within 100 miles “as the crow flies” across the surface of the
Earth, but that's really not closely related to their practical distance. No matter
 
Search WWH ::




Custom Search