State of the Art and Development Trend - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

probability) is lower for orderly configurations and higher for disorderly

configurations. Therefore, if we try to visualize the complete data set from an

individual data point, then an orderly configuration means that for most of the

individual data points there are some data points close to it ( i.e. they probably

belong to the same cluster) and others away from it. In a similar reasoning, a

disorderly configuration means that most of the data points are scattered randomly.

So, if the entropy is evaluated at each data point then the data point with minimum

entropy is a good candidate for the cluster centre. This may not be valid if the data

have outliers, in which case they should be removed first before determining the

cluster centres. The next section addresses this issue more.

The entropy measure between two data points can assume any value within the

range [0, 1]. It shows very low values (close to zero) for very close data points, and

very high values (close to unity) for those data points separated by the distance

close to the mean distance of all pairs of data points. The similarity measure S is

based on distance, and assumes a very small value (close to zero) for very close

pairs of data points that probably fall on the same cluster, and a very large value

(close to unity) for very distant pairs of data points that probably fall into different

clusters. Entropy at one data point with respect to another data point is defined as

ES S

log

1

S

log

1

.

S

(10.26)

2

From the above expression it can be seen that entropy assumes the maximum value

of 1.0 when the similarity S = 0.5 and the minimum value of 0.0 when S = 0.0 or

1.0 (Klir and Folger, 1988). The total entropy value at a data point z i with respect

to all other data points is defined as

ji

z

^

`

E

S

log

S

1

S

log

1

S

,

(10.27)

¦

ij

2

ij

2

ij

jZ

where S ij is the similarity between the data points z i and z j , normalized to [0.0, 1.0].

It is defined as

ij Se D

D

,

(10.28)

ij

where D ij is the distance between the data points z i and z j . If we represent the

similarity against the distance graphically, then the representative curve will have a

greater curvature for a larger value of D The experiments with various values of

D suggest that it should be robust for all kinds of data sets. Yao et al . (2000)

proposed calculating the D value automatically by assigning a similarity of 0.5 in

Equation (10.28) when the distance between two data points is mean distance of all

pairs of data points. This produced a good result, as confirmed in various

experiments (Yao et al ., 2000). Mathematically, this can be expressed as

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home