State of the Art and Development Trend - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

log

0.5

e

D

,

(10.29)

where D is the mean distance among the pairs of data points in a hyperspace.

Hence, D is determined by the data and can be calculated automatically.

10.5.2.2 Fuzzy Clustering Based on Entropy Measure

In order to determine the first cluster centre, the entropy at each data point is

evaluated. The data point that has the lowest entropy value is selected as a potential

cluster centre. Thereafter, this first cluster centre and all the data points that have

similarity with it greater than a threshold value of E are removed, so that they are

ignored as possible subsequent cluster centres in the next iterations. The procedure

is continued with the search for the next cluster, which is selected as the point with

the minimal entropy value among the remaining data points and, again, this cluster

centre and the associated data points having similarity greater than Eare similarly

removed. This process is repeated until no data points are left.

The parameter E can be viewed as a threshold of similarity value or as

association value among the data points in the same clusters. It assumes a value

within the range (0.0, 1.0), whereby the value of E= 0.7 is quite robust, as shown

experimentally in Yao et al . (2000). In the algorithm described below, T is the

input data with N data points, each of which has M dimensions.

Algorithm 10.1. Entropy-based fuzzy clustering: EFC(T)

x Step 1: calculate the entropy for each z i in T for i = 1, 2,…, N.

x Step 2: choose z iMin that has lowest entropy

x Step 3: remove z iMin and all the data points that have similarity greater

than E with the cluster centre z iMin from the data set T.

x Step 4: continue step 2 to 3 till T is not empty.

If the data set has outliers that are very distant from the rest of the data, then the

EFC algorithm described may select these data points for the cluster centres

because the entropy value for these data points will also be very low. To overcome

this problem, a new parameter J is introduced in Yao et al. (2000) that acts as a

threshold between potential clusters and the outliers. Before selecting a data point

as cluster centre the number of data points are counted that have similarity greater

than E with that cluster centre. If the number of counts is less than the value of J,

then that data point is unfit to be a cluster centre and should be rejected from the

data set, so that it is not considered further for the next iteration. In the work of

Yao et al. (2000) J= 0.05 N is selected as the threshold for outliers detection. The

selection of J and, therefore, the corresponding removal of outliers also prevent

the data overfitting.

Search WWH ::

Custom Search

Home