Database Reference
In-Depth Information
h ( z ) ≠ h ( y ). However, this statement is true because whenever h ( x ) ≠ h ( y ), at least one
of h ( x ) and h ( y ) must be different from h ( z ). They could not both be h ( z ), because then
h ( x ) and h ( y ) would be the same.
3.5.4
Cosine Distance
The cosine distance makes sense in spaces that have dimensions, including Euclidean
spaces and discrete versions of Euclidean spaces, such as spaces where points are vectors
with integer components or boolean (0 or 1) components. In such a space, points may be
thought of as directions. We do not distinguish between a vector and a multiple of that vec-
tor. Then the cosine distance between two points is the angle that the vectors to those points
make. This angle will be in the range 0 to 180 degrees, regardless of how many dimensions
the space has.
We can calculate the cosine distance by first computing the cosine of the angle, and then
applying the arc-cosine function to translate to an angle in the 0-180 degree range. Given
two vectors x and y , the cosine of the angle between them is the dot product x.y divided by
the L 2 -norms of x and y (i.e., their Euclidean distances from the origin). Recall that the dot
product of vectors [ x 1 , x 2 , . . . , x n ] . [ y 1 , y 2 , . . . , y n ] is
EXAMPLE 3.13 Let our two vectors be x = [1 , 2 , −1] and = [2 , 1 , 1]. The dot product x.y is
1 × 2 + 2 × 1 + (−1) × 1 = 3. The L 2 -norm of both vectors is For example, x has L 2 -norm
Thus, the cosine of the angle between x and y is or 1/2. The angle whose co-
sine is 1/2 is 60 degrees, so that is the cosine distance between x and y .
We must show that the cosine distance is indeed a distance measure. We have defined it
so the values are in the range 0 to 180, so no negative distances are possible. Two vectors
have angle 0 if and only if they are the same direction. 4 Symmetry is obvious: the angle
between x and y is the same as the angle between y and x . The triangle inequality is best
argued by physical reasoning. One way to rotate from x to y is to rotate to z and thence to
y . The sum of those two rotations cannot be less than the rotation directly from x to y .
3.5.5
Edit Distance
This distance makes sense when points are strings. The distance between two strings x =
x 1 x 2 · · · x n and y = y 1 y 2 · · · y m is the smallest number of insertions and deletions of single
characters that will convert x to y .
Search WWH ::




Custom Search