Constrained Partitional Clustering of Text Data: An Overview - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

7.4 Partitional Clustering with Constraints

Some of the very first algorithms that made use of constraints were varia-

tions of the popular KMeans iterative algorithm. The purpose of the KMeans

algorithm is to (locally) minimize the vector quantization error (also known

as the distortion) shown in Equation 7.2.

k

VQE =

VQE j

(7.2)

j =1

2

x i ∈

1

D ( μ j ,x i ) 2

VQE j =

(7.3)

π j

where j indexes over the clusters, and k is the number of clusters (specified

as an input parameter), and D is the distance function.

The KMeans algorithm is an iterative algorithm which in every step at-

tempts to further minimize the distortion. Given a set of cluster centroids,

the algorithm assigns instances to their nearest centroid which of course min-

imizes the distortion. This is step 1 of the algorithm. Step 2 is to recalculate

the cluster centroids so as to minimize the distortion. This can be achieved

by taking the first order derivative of the error (Equation 7.3) with respect to

the j th centroid and setting it to zero and solving. A solution to the resulting

equation gives us the KMeans centroid update rule as shown in Equation 7.5.

d ( x i ∈π j D ( μ j ,x i ) 2 )

d ( μ j )

d ( VQE j )

d ( μ j )

=

=0

(7.4)

μ j =

x i ∈π j

⇒

x i /

|

π j |

(7.5)

Recall that π j is the set of instances closest to the centroid of the j th cluster.

These two steps are used in the standard KMeans algorithm shown in Figure

7.4 .

7.4.1 COP-KMeans

The COP-KMeans algorithm shown in Figure 7.4.1 can be seen to be a

two part variation of the KMeans algorithm that incorporates conjunctions

of constraints. Firstly, the transitive closure over the must-linked instances

is computed, so that c = ( x, y )and c = ( y, z )

c = ( x, z )and x, y, z form a con-

nected component. The resultant connected components are replaced by a

⇒

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home