Database Reference
In-Depth Information
Euclidean space always exists and is a point in the space. However, consider the space of sets for which we defined
the Jaccard distance. The notion of the “average” of two sets makes no sense. Likewise, the space of strings, where
we can use the edit distance, does not let us take the “average” of strings.
Vector spaces, for which we suggested the cosine distance, may or may not be Euclidean. If the components of the
vectors can be any real numbers, then the space is Euclidean. However, if we restrict components to be integers, then
the space is not Euclidean. Notice that, for instance, we cannot find an average of the vectors [ 1 , 2 ] and [ 3 , 1 ] in the
space of vectors with two integer components, although if we treated them as members of the two-dimensional Euc-
lidean space, then we could say that their average was [2.0, 1.5].
3.5.6
Hamming Distance
Given a space of vectors, we define the Hamming distance between two vectors to be the
number of components in which they differ. It should be obvious that Hamming distance
is a distance measure. Clearly the Hamming distance cannot be negative, and if it is zero,
then the vectors are identical. The distance does not depend on which of two vectors we
consider first. The triangle inequality should also be evident. If x and z differ in m com-
ponents, and z and y differ in n components, then x and y cannot differ in more than m +
n components. Most commonly, Hamming distance is used when the vectors are boolean;
they consist of 0's and 1's only. However, in principle, the vectors can have components
from any set.
EXAMPLE 3.16 The Hamming distance between the vectors 10101 and 11110 is 3. That is,
these vectors differ in the second, fourth, and fifth components, while they agree in the first
and third components.
3.5.7
Exercises for Section 3.5
! EXERCISE 3.5.1 On the space of nonnegative integers, which of the following functions
are distance measures? If so, prove it; if not, prove that it fails to satisfy one or more of the
axioms.
(a) max( x, y ) = the larger of x and y .
(b) diff( x, y ) = | x y | (the absolute magnitude of the difference between x and y ).
(c) sum( x, y ) = x + y .
EXERCISE 3.5.2 Find the L 1 and L 2 distances between the points (5 , 6 , 7) and (8 , 2 , 4).
Search WWH ::




Custom Search