Constrained Partitional Clustering of Text Data: An Overview - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

Θ ,C )= 1

P( Y

Z exp(

−

v ( i, j ))

(7.17)

i,j

where each constraint potential function v ( i, j ) has the following form inspired

by the generalized Potts model (32) where f ML and f CL are the distances

between the constrained points:

⎧

⎨

w ij f ML ( i, j )if c ij =1and y i

= y j

v ( i, j )=

w ij f CL ( i, j )if c ij =

−

1and y i = y j

(7.18)

⎩

other ise

The joint probability formulation in Equation 7.16 provides a general frame-

work for incorporating various distance measures in clustering by choosing a

particular form of p ( x i |

y i , Θ), the probability density that generates the i -th

instance x i from cluster y i . Basu et al. (6) restrict their attention to proba-

bility densities from the exponential family, where the conditional density for

observed data can be represented as follows:

Z Θ exp −

D ( x i ,μ y i )

p ( x i |

y i , Θ) =

(7.19)

where D ( x i ,μ y i ) is the Bregman divergence between x i and μ y i , corresponding

to the exponential density p ,and Z Θ is the normalizer (3). Different clustering

models fall into this exponential form:

•

If x i and μ y i are vectors in Euclidean space, and D is the square of the

L 2 distance parameterized by a positive semidefinite weight matrix A ,

D ( x i ,μ y i )=

A , then the cluster conditional probability is a

d -dimensional multivariate normal density with covariance matrix A − 1 :

p ( x i |

x i −

μ y i

A )(30);

(2 π ) d/ 2 |A| − 1 / 2

y i , Θ) =

exp(

−

2 (

x i −

μ y i

•

If x i and μ y i are probability distributions, and D is KL-divergence

( D ( x i ,μ y i )= d m =1 x im log

x im

μ y i m ), then the cluster conditional prob-

ability is a multinomial distribution (20).

The relation in Equation 7.19 holds even if D is not a Bregman divergence

but a directional distance measure such as cosine distance, which is useful in

text clustering. Then, if x i and μ y i are vectors of unit length and D is one

minus the dot-product of the vectors D ( x i ,μ y i )=1

,then

the cluster conditional probability is a von-Mises Fisher (vMF) distribution

with unit concentration parameter (2), which is the spherical analog of a

Gaussian.

Putting Equation 7.19 into Equation 7.16 and taking logarithms gives the

following cluster objective function, minimizing which is equivalent to maxi-

mizing the joint probability over the HMRF in Equation 7.16:

P d m =1

x im μ y i m

x i

−

μ y i

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home