A Novel Linear Cellular Automata-Based Data Clustering Algorithm - Foundations on Natural and Artificial Computation - page 75

Information Technology Reference

In-Depth Information

In the initial iterations, which are shown in the left side, the data items keep the

initial random order. According the algorithm operates, the data items begin

to keep together and the clusters start to appear. Around the 75 iteration —

approximately in the middle of the left diagram— the black and greys clusters

are almost clustered, just remain some data items belonged to these clusters

in the right side. The white cluster is divided into two noncontiguous clusters.

Around the 200 iteration the black cluster is completely formed and keeps in

such state until the end. The final tape's state, in which the data items are

grouped into the 4 existing clusters, is achieved in the iteration 283. Once the

convergence is reached, this state is maintained.

Afterwards an important post-processing step oriented to the automatic group-

ing of the data items scattered on the tape has to be applied.

As the data items are linearly grouped in the cellular automaton tape a

straightforward way of finding the natural clusters within the data is by an-

alyzing the chainmap diagram formed by the distances of the successive data

items. The main problem is to detect automatically the optimum threshold that

gives the correct number of clusters for each dataset.

Fig. 5 displays the chainmap of the synthetic dataset in which it can be notice

the existence of 4 local maxima, each of one corresponding to an individual

cluster. These local maxima are located in the 20, 40, 60 and 80 bins. Notice

that the last one is needed for defining the the frontier between the last cluster

and the first one due to we are considering a toroidal lattice.

We have also employed the standard k -means clustering with this synthetic

dataset and we have achieved similar successful results for k =4.Asinour

proposed algorithm we have used the Euclidean distance as metric.

1.0

0.8

0.6

0.4

0.2

0.0

0

10

20

30

40

50

60

70

80

Fig. 5. Chainmap of the synthetic dataset

Next Page

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home