Constrained Partitional Clustering of Text Data: An Overview - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

The intuitive interpretation of the centroid update rule is that if a must-link

constraint is violated, the cluster centroid is moved towards the other cluster

containing the other instance. Similarly, the interpretation of the update

rule for a cannot-link constraint violation is that cluster centroid containing

both constrained instances should be moved to the nearest cluster centroid so

that one of the instances eventually gets assigned to it, thereby satisfying the

constraint.

7.4.3 LCVQE: An Extension to CVQE

Pelleg and Baras (39) create a variation of the assignment and update rules

for CVQE that they term LCVQE. Though their algorithm was not derived to

minimize a particular objective function, it was shown to improve performance

on several standard datasets both in terms of accuracy and run-time. The two

main extensions made by this algorithm over CVQE are: a) not computing

all possible k 2 assignments but only a subset of reasonable assignments and

b) Changing the penalty for a cannot-link constraint to be the distance from

the most outlying (with respect to the cluster centroid) instance in the CL

constraint to the cluster centroid nearest it.

The assignment step shown in Equation 7.9 and the centroid update rule

are shown in Equation 7.10.

argmin j D ( x i ,μ j ) 2

∀

x i

∈

C = ∪

C = :

(7.9)

∀

( x a ,x b )

∈

C = : argmin [ i = g ( x a ) ,j = g ( x b )] , [ i = g ( x a ) ,j = i, [ i = j,j = g ( x b )]

D ( x a ,μ i ) 2 + D ( x b ,μ j ) 2 +

D ( μ j ,μ i ) 2

δ ( a, b )

∗

∀

( x a ,x b )

∈

C = : argmin [ i = g ( x a ) ,j = g ( x b )] , [ i = g ( x a ) ,j = i.D ( x a ,g ( x a )) <D ( x b ,g ( x b ))]

D ( x a ,μ i ) 2 + D ( x b ,μ j ) 2 + δ ( a, b )

D ( μ j ,μ g ( x b )) 2

∗

μ j =

(7.10)

x i ∈π j [ x i + ( x i ,x a ) ∈C = ,

g ( x i )= g ( x a )

μ g ( x a ) +

μ g ( x a ) ]

g ( x i )= g ( x a ) ,D ( x i ) <D ( x a )

(

x i ,x a ) ∈

+ s i ∈μ j , ( s i ,s x ) ∈C = ,g ( s i ) = g ( s x ) 1+ s i ∈μ j , ( s i ,s x ) ∈C = ,g ( s i ) = g ( s x ) 1

μ j |

7.4.4 Probabilistic Penalty - PKM

The PKM algorithm allows constraints to be violated during clustering, but

enforces a probabilistic penalty of constraint violation. It is a special case of

the HMRF-KMeans algorithm, which is described in detail in Section 7.6 —

PKM is an ablation of HMRF-KMeans, doing constraint enforcement but not

performing distance learning.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home