Databases Reference
In-Depth Information
It is easy to verify the first three requirements for a distance measure are
satisfied. The Euclidean distance between two points cannot be negative, be-
cause the positive square root is intended. Since all squares of real numbers are
nonnegative, any i such that x i = y i forces the distance to be strictly positive.
On the other hand, if x i = y i for all i, then the distance is clearly 0. Symmetry
follows because (x i
−x i ) 2 . The triangle inequality requires a good
deal of algebra to verify. However, it is well understood to be a property of
Euclidean space: the sum of the lengths of any two sides of a triangle is no less
than the length of the third side.
There are other distance measures that have been used for Euclidean spaces.
For any constant r, we can define the L r -norm to be the distance measure d
defined by:
−y i ) 2
= (y i
n
| r ) 1/r
d([x 1 , x 2 , . . . , x n ], [y 1 , y 2 , . . . , y n ]) = (
|x i
−y i
i=1
The case r = 2 is the usual L 2 -norm just mentioned. Another common distance
measure is the L 1 -norm, or Manhattan distance. There, the distance between
two points is the sum of the magnitudes of the differences in each dimension.
It is called “Manhattan distance” because it is the distance one would have to
travel between points if one were constrained to travel along grid lines, as on
the streets of a city such as Manhattan.
Another interesting distance measure is the L -norm, which is the limit
as r approaches infinity of the L r -norm. As r gets larger, only the dimension
with the largest difference matters, so formally, the L -norm is defined as the
maximum of|x i −y i |over all dimensions i.
Example 3.12 : Consider the two-dimensional Euclidean space (the custom-
ary plane) and the points (2, 7) and (6, 4).
The L 2 -norm gives a distance
of
4 2 + 3 2 = 5. The L 1 -norm gives a distance of
|2−6|+|7−4|= 4 + 3 = 7. The L -norm gives a distance of
(2−6) 2 + (7−4) 2
=
max(|2−6|,|7−4|) = max(4, 3) = 4
2
3.5.3
Jaccard Distance
As mentioned at the beginning of the section, we define the Jaccard distance
of sets by d(x, y) = 1− SIM (x, y). That is, the Jaccard distance is 1 minus the
ratio of the sizes of the intersection and union of sets x and y. We must verify
that this function is a distance measure.
1. d(x, y) is nonnegative because the size of the intersection cannot exceed
the size of the union.
Search WWH ::




Custom Search