Database Reference
In-Depth Information
7.4 Partitional Clustering with Constraints
Some of the very first algorithms that made use of constraints were varia-
tions of the popular KMeans iterative algorithm. The purpose of the KMeans
algorithm is to (locally) minimize the vector quantization error (also known
as the distortion) shown in Equation 7.2.
k
VQE =
VQE j
(7.2)
j =1
2
x i
1
D ( μ j ,x i ) 2
VQE j =
(7.3)
π j
where j indexes over the clusters, and k is the number of clusters (specified
as an input parameter), and D is the distance function.
The KMeans algorithm is an iterative algorithm which in every step at-
tempts to further minimize the distortion. Given a set of cluster centroids,
the algorithm assigns instances to their nearest centroid which of course min-
imizes the distortion. This is step 1 of the algorithm. Step 2 is to recalculate
the cluster centroids so as to minimize the distortion. This can be achieved
by taking the first order derivative of the error (Equation 7.3) with respect to
the j th centroid and setting it to zero and solving. A solution to the resulting
equation gives us the KMeans centroid update rule as shown in Equation 7.5.
d ( x i ∈π j D ( μ j ,x i ) 2 )
d ( μ j )
d ( VQE j )
d ( μ j )
=
=0
(7.4)
μ j =
x i ∈π j
x i /
|
π j |
(7.5)
Recall that π j is the set of instances closest to the centroid of the j th cluster.
These two steps are used in the standard KMeans algorithm shown in Figure
7.4 .
7.4.1 COP-KMeans
The COP-KMeans algorithm shown in Figure 7.4.1 can be seen to be a
two part variation of the KMeans algorithm that incorporates conjunctions
of constraints. Firstly, the transitive closure over the must-linked instances
is computed, so that c = ( x, y )and c = ( y, z )
c = ( x, z )and x, y, z form a con-
nected component. The resultant connected components are replaced by a
 
Search WWH ::




Custom Search