Information Technology Reference
In-Depth Information
1.1.2 Designing an Unsupervised Learning Algorithm
Let us consider the well-known Iris data-set [86] that contains 150 instances
of four scalar attribute values and a class assignment each. Each of the four
attributes refer to a particular measure of the physical appearance of the flower.
Each instance belongs to one of the three possible classes of the plant.
Assume that it is unknown which class each instance belongs to and that we
want to design an algorithm that groups the instances into three classes, based
on their similarity of appearance that is inferred from the similarity of their
attribute values. This task is an unsupervised learning task with the inputs
given by the attribute values of each instance.
Ad-Hoc Design of an Algorithm
Let us firstly approach the task intuitively by designing an algorithm that aims
at grouping the instances such that the similarity of any two instances within
the same group or cluster is maximised, and between different clusters is mini-
mised. The similarity between two instances is measured by the inverse squared
Euclidean distance 1 between the points that represent these instances in the
four-dimensional attribute space, spun by the attribute values.
Starting by randomly assigning each instance to one of the three clusters, the
centre of each of these clusters is computed by the average attribute values of
all instances assigned to that cluster. To group similar instances into the same
cluster, each instance is now re-assigned to the cluster to whose centre it is
closest. Subsequently, the centres of the clusters are recomputed. Iterating these
two steps causes the distance between instances within the same cluster to be
minimised, and between clusters to be maximised. Thus, we have reached our
goal. The concept of clustering by using the inverse distance between the data
points as a measure of their similarity is illustrated in Figure 1.1(a).
This clustering algorithm is the well-known K-means algorithm, which is gua-
ranteed to converge to a stable solution, which is, however, not always optimal
[160, 19]. While it is a functional algorithm, it leaves open many question: is
the squared Euclidean distance indeed the best distance measure to use? What
are the implicit assumptions that are made about the data? How should we
handle data where the number of classes is unknown? In which cases would the
algorithm fail?
Design of Algorithm by Modelling the Data
Let us approach the same problem from a different perspective: assume that for
each Iris class there is a virtual standard instance — something like a prototy-
pical Iris — and that all instances of a class are just noisy instantiations of the
1 The squared Euclidean distance between two equally-sized vectors a =( a 1 ,a 2 ,... ) T
and b =( b 1 ,b 2 ,... ) T is given by i ( a i − b i ) 2 and is thus proportional to the sum of
squared differences between the vectors' elements (see also Section 5.2). Therefore,
two instances are considered as being similar if the squared differences between their
attribute values is small.
 
Search WWH ::




Custom Search