Fuzzy Logic Approach - Computational Intelligence in Time Series Forecasting

Information Technology Reference

In-Depth Information

Number of clusters

The total number of clusters c is the most important parameter, as the remaining

parameters have little influence on the resulting partition: when clustering real data

without any prior information about the structures in the data, one usually has to

make assumptions about the number of underlying clusters. The clustering

algorithm chosen then searches for c clusters regardless of whether they are really

present in the data or not. Two main approaches to determining the appropriate

number of clusters in the data can be distinguished:

A. Validity measures

Validity measures are scalar indices that assess the goodness of the partition

obtained. Clustering algorithms generally aim at locating well-separated and

compact clusters. When the number of clusters is chosen equal to the number of

groups that are actually present in the data, it is expected that the clustering

algorithm will identify them correctly. When this is not the case, misclassifications

appear, and the clusters are not likely to be well-separated and compact. Hence,

most cluster validity measures are open to interpretation and can be formulated in

different ways. Consequently, many validity measures have been introduced in the

literature (Bezdek, 1981; Gath and Geva, 1989; Pal and Bezdek, 1995). For the

FCM algorithm, the Xie-Beni index (Xie and Beni, 1991)

cN

2

m

gs

¦¦

Zv

P

s

g

gs

11

F

ZUV

;,

(4.23)

2

c

Z

v

min

s

g

gh

z

has been found to perform well in practice. This index can be interpreted as the

ratio of the total within-group variance and the separation of the cluster centers.

The best partition minimizes the value of

F

ZUV

;,

.

B. Iterative merging

In the iterative cluster merging, one starts with a sufficiently large number of

clusters and successively by merging clusters, that are similar (compatible) with

respect to some well-defined criteria (Krishnapuram and Freg, 1992; Kaymak and

Babuška, 1995), the number of clusters is reduced. One can also adopt the opposite

approach, i.e. start with a small number of clusters and iteratively insert clusters in

the region where the data points have a low degree of membership in the existing

clusters (Gath and Geva, 1989).

Fuzziness parameter

The fuzziness exponent or weighting exponent m is a rather important parameter

that is to be selected properly as well. This is because it significantly influences the

fuzziness of the resulting partition. As m approaches to one, the partition becomes

hard partition ( g P {0,1}) and v g are ordinary means of the clusters. On the other

hand, as m of, the partition becomes completely fuzzy (

g P = 1/ c ) and the

Computational Intelligence in Time Series Forecasting

Search WWH ::

Custom Search

Home