Database Reference
In-Depth Information
is required. There are two main type of measures used to estimate this
relation: distance measures and similarity measures.
Many clustering methods use distance measures to determine the
similarity or dissimilarity between any pair of objects. It is useful to denote
the distance between two instances x i and x j as: d( x i ,x j ). A valid distance
measure should be symmetric and obtains its minimum value (usually zero)
in case of identical vectors. The distance measure is called a metric distance
measure if it also satisfies the following properties:
(1) Triangle inequality d( x i , x k )
d( x i ,x j )+d( x j ,x k )
x i ,x j ,x k
S .
(2) d( x i ,x j )=0
x i = x j
x i ,x j
S .
8.4.2
Minkowski: Distance Measures for Numeric
Attributes
Given two p -dimensional instances, x i =( x i 1 ,x i 2 ,...,x ip )and x j =
( x j 1 ,x j 2 ,...,x jp ), the distance between the two data instances can be
calculated using the Minkowski metric:
g ) 1 /g .
g +
g +
d ( x i ,x j )=(
|
x i 1
x j 1 |
|
x i 2
x j 2 |
···
+
|
x ip
x jp |
The commonly used Euclidean distance between two objects is achieved
when g =2.Given g = 1, the sum of absolute paraxial distances (Manhattan
metric) is obtained, and with g =
one gets the greatest of the paraxial
distances (Chebychev metric).
The measurement unit used can affect the clustering analysis. To avoid
the dependence on the choice of measurement units, the data should be
standardized. Standardizing measurements attempts to give all variables an
equal weight. However, if each variable is assigned with a weight according
to its importance, then the weighted distance can be computed as:
g ) 1 /g ,
g + w 2 |
g +
d ( x i ,x j )=( w 1 |
x i 1
x j 1 |
x i 2
x j 2 |
···
+ w p |
x ip
x jp |
where w i
[0 ,
).
8.4.2.1
Distance Measures for Binary Attributes
The distance measure described in the last section may be easily computed
for continuous-valued attributes. In the case of instances described by
categorical, binary, ordinal or mixed type attributes, the distance measure
should be revised.
In the case of binary attributes, the distance between objects may be
calculated based on a contingency table. A binary attribute is symmetric
Search WWH ::




Custom Search