Digital Signal Processing Reference
In-Depth Information
The Improved Partition Entropy Coefficient
Jian Mei Chen
School of Mechanical Engineering,
Hunan University of International Economics,
Changsha China, 410205
jianmeich@163.com
Abstract. This paper proposed an improved partition entropy coefficient (IPE)
index by making using of the trend of partition entropy coefficient (PE) index to
increase as the cluster number increases. Comparisons between IPE index and
PE index and two existed cluster validity indexes are conducted on four real da-
ta sets. Experimental results show that IPE is able to identify the cluster number
underlying the data set in the case that PE index is unable to do and outperforms
the two existed cluster validity indexes.
Keywords: partition entropy coefficient, fuzzy c-means, cluster number.
1
Introduction
Bezdek proposed the partition entropy coefficient (PE) that measures the amount of
overlap among clusters [1]. The range of values for PE is [0, logac], where c is the
cluster number and a is the base of logarithm. The closer the value of PE, the harder
the clustering is. On the other hand, the closer the value of PE to logac, the fuzzier the
clustering is. Values close to logac indicate the absence of any clustering structure in
X or the adopted clustering algorithm failed to unravel it [2].
A disadvantage of PE index is that it exhibits a dependence on c with a trend to in-
crease, as c increases. Thus, when it is employed to identify the number of clusters,
one has to seek significant knees for PE in the plot of the index PE versus c. Moreo-
ver, it is also sensitive to the fuzzifier m of fuzzy c-means clustering algorithm. It can
be shown that as
m , PE tends to 0 for all c's, that is, it is unable to discrimi-
nate between different values of c. On the other hand, as
+
m
+∞
, PE tends to logac
and exhibit the most significant knee at c=2 [3].
The above disadvantages of PE may result in multiple significant knees in the plot
of PE versus c when PE does not increase strictly as c increases, and no significant
knee in the plot of PE versus c when PE increases strictly as c increases. Thus, users
are hard to determine the number of clusters in these cases. PE index is simple and
easy to compute. If its disadvantages can be avoided, it may be a good cluster validity
index for identifying the cluster number.
This paper devotes to overcoming PE's disadvantages by turning them into advan-
tages. Experiments show that PE increase sharply as c increases when the base a of
 
Search WWH ::




Custom Search