Introduction - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

1.1.2 Designing an Unsupervised Learning Algorithm

Let us consider the well-known Iris data-set [86] that contains 150 instances

of four scalar attribute values and a class assignment each. Each of the four

attributes refer to a particular measure of the physical appearance of the flower.

Each instance belongs to one of the three possible classes of the plant.

Assume that it is unknown which class each instance belongs to and that we

want to design an algorithm that groups the instances into three classes, based

on their similarity of appearance that is inferred from the similarity of their

attribute values. This task is an unsupervised learning task with the inputs

given by the attribute values of each instance.

Ad-Hoc Design of an Algorithm

Let us firstly approach the task intuitively by designing an algorithm that aims

at grouping the instances such that the similarity of any two instances within

the same group or cluster is maximised, and between different clusters is mini-

mised. The similarity between two instances is measured by the inverse squared

Euclidean distance 1 between the points that represent these instances in the

four-dimensional attribute space, spun by the attribute values.

Starting by randomly assigning each instance to one of the three clusters, the

centre of each of these clusters is computed by the average attribute values of

all instances assigned to that cluster. To group similar instances into the same

cluster, each instance is now re-assigned to the cluster to whose centre it is

closest. Subsequently, the centres of the clusters are recomputed. Iterating these

two steps causes the distance between instances within the same cluster to be

minimised, and between clusters to be maximised. Thus, we have reached our

goal. The concept of clustering by using the inverse distance between the data

points as a measure of their similarity is illustrated in Figure 1.1(a).

This clustering algorithm is the well-known K-means algorithm, which is gua-

ranteed to converge to a stable solution, which is, however, not always optimal

[160, 19]. While it is a functional algorithm, it leaves open many question: is

the squared Euclidean distance indeed the best distance measure to use? What

are the implicit assumptions that are made about the data? How should we

handle data where the number of classes is unknown? In which cases would the

algorithm fail?

Design of Algorithm by Modelling the Data

Let us approach the same problem from a different perspective: assume that for

each Iris class there is a virtual standard instance — something like a prototy-

pical Iris — and that all instances of a class are just noisy instantiations of the

1 The squared Euclidean distance between two equally-sized vectors a =( a 1 ,a 2 ,... ) T

and b =( b 1 ,b 2 ,... ) T is given by i ( a i − b i ) 2 and is thus proportional to the sum of

squared differences between the vectors' elements (see also Section 5.2). Therefore,

two instances are considered as being similar if the squared differences between their

attribute values is small.

Search WWH ::

Custom Search

Home