Databases Reference
In-Depth Information
3.5
Distance Measures
We now take a short detour to study the general notion of distance measures.
The Jaccard similarity is a measure of how close sets are, although it is not
really a distance measure. That is, the closer sets are, the higher the Jaccard
similarity. Rather, 1 minus the Jaccard similarity is a distance measure, as we
shall see; it is called the Jaccard distance.
However, Jaccard distance is not the only measure of closeness that makes
sense. We shall examine in this section some other distance measures that have
applications. Then, in Section 3.6 we see how some of these distance measures
also have an LSH technique that allows us to focus on nearby points without
comparing all points. Other applications of distance measures will appear when
we study clustering in Chapter 7.
3.5.1 Definition of a Distance Measure
Suppose we have a set of points, called a space. A distance measure on this
space is a function d(x, y) that takes two points in the space as arguments and
produces a real number, and satisfies the following axioms:
1. d(x, y)≥0 (no negative distances).
2. d(x, y) = 0 if and only if x = y (distances are positive, except for the
distance from a point to itself).
3. d(x, y) = d(y, x) (distance is symmetric).
4. d(x, y)≤d(x, z) + d(z, y) (the triangle inequality).
The triangle inequality is the most complex condition. It says, intuitively, that
to travel from x to y, we cannot obtain any benefit if we are forced to travel via
some particular third point z. The triangle-inequality axiom is what makes all
distance measures behave as if distance describes the length of a shortest path
from one point to another.
3.5.2 Euclidean Distances
The most familiar distance measure is the one we normally think of as “dis-
tance.” An n-dimensional Euclidean space is one where points are vectors of n
real numbers. The conventional distance measure in this space, which we shall
refer to as the L 2 -norm, is defined:
n
d([x 1 , x 2 , . . . , x n ], [y 1 , y 2 , . . . , y n ]) =
(x i
−y i ) 2
i=1
That is, we square the distance in each dimension, sum the squares, and take
the positive square root.
Search WWH ::




Custom Search