Information Technology Reference
In-Depth Information
Fig. 6. The Iris dataset. The triangle marks represent the Iris setosa, the circles cor-
respond to the Iris versicolor and the squares are used for the Iris virginica.
3.2 Clustering Results with the Iris Dataset
We have also tested the one-dimensional cellular automata-based clustering al-
gorithm with a real dataset as the Iris dataset. The Iris dataset [4] was first used
and even created by Fisher [3] in his pioneering research work on linear discrim-
inant analysis, and today it is still an up-to-date, standard pattern recognition
problem for testing discriminant techniques and algorithms.
In this well-known and classical multiclass pattern recognition problem, three
classes of Iris flowers ( setosa , versicolor and virginica ) have to be classified ac-
cording to four continuous discriminant variables measured in centimeters: sepal
length, sepal width, petal length and petal width.
Fig. 6 shows the three classes. We have only represented three variables of this
four-dimensional dataset: the sepal length, the sepal width and the petal width.
The triangle marks represent the Iris setosa, the circles are the Iris versicolor
items and the squares correspond to the Iris virginica.
It is well-known that this dataset only contains two clusters with an obvious
separation. The Iris setosa is in one of those clusters while the other two species,
Iris versicolor and Iris virginica, are in the other cluster.
 
Search WWH ::




Custom Search