Information Technology Reference
In-Depth Information
probability) is lower for orderly configurations and higher for disorderly
configurations. Therefore, if we try to visualize the complete data set from an
individual data point, then an orderly configuration means that for most of the
individual data points there are some data points close to it ( i.e. they probably
belong to the same cluster) and others away from it. In a similar reasoning, a
disorderly configuration means that most of the data points are scattered randomly.
So, if the entropy is evaluated at each data point then the data point with minimum
entropy is a good candidate for the cluster centre. This may not be valid if the data
have outliers, in which case they should be removed first before determining the
cluster centres. The next section addresses this issue more.
The entropy measure between two data points can assume any value within the
range [0, 1]. It shows very low values (close to zero) for very close data points, and
very high values (close to unity) for those data points separated by the distance
close to the mean distance of all pairs of data points. The similarity measure S is
based on distance, and assumes a very small value (close to zero) for very close
pairs of data points that probably fall on the same cluster, and a very large value
(close to unity) for very distant pairs of data points that probably fall into different
clusters. Entropy at one data point with respect to another data point is defined as
ES S
log
1
S
log
1
.
S
(10.26)
2
2
From the above expression it can be seen that entropy assumes the maximum value
of 1.0 when the similarity S = 0.5 and the minimum value of 0.0 when S = 0.0 or
1.0 (Klir and Folger, 1988). The total entropy value at a data point z i with respect
to all other data points is defined as
ji
z
^
`
E
S
log
S
1
S
log
1
S
,
(10.27)
¦
ij
2
ij
ij
2
ij
jZ
where S ij is the similarity between the data points z i and z j , normalized to [0.0, 1.0].
It is defined as
ij Se D
D
,
(10.28)
ij
where D ij is the distance between the data points z i and z j . If we represent the
similarity against the distance graphically, then the representative curve will have a
greater curvature for a larger value of D The experiments with various values of
D suggest that it should be robust for all kinds of data sets. Yao et al . (2000)
proposed calculating the D value automatically by assigning a similarity of 0.5 in
Equation (10.28) when the distance between two data points is mean distance of all
pairs of data points. This produced a good result, as confirmed in various
experiments (Yao et al ., 2000). Mathematically, this can be expressed as
Search WWH ::




Custom Search