Database Reference
In-Depth Information
Θ ,C )= 1
P( Y
|
Z exp(
v ( i, j ))
(7.17)
i,j
where each constraint potential function v ( i, j ) has the following form inspired
by the generalized Potts model (32) where f ML and f CL are the distances
between the constrained points:
w ij f ML ( i, j )if c ij =1and y i
= y j
v ( i, j )=
w ij f CL ( i, j )if c ij =
1and y i = y j
(7.18)
0
other ise
The joint probability formulation in Equation 7.16 provides a general frame-
work for incorporating various distance measures in clustering by choosing a
particular form of p ( x i |
y i , Θ), the probability density that generates the i -th
instance x i from cluster y i . Basu et al. (6) restrict their attention to proba-
bility densities from the exponential family, where the conditional density for
observed data can be represented as follows:
Z Θ exp
D ( x i y i )
1
p ( x i |
y i , Θ) =
(7.19)
where D ( x i y i ) is the Bregman divergence between x i and μ y i , corresponding
to the exponential density p ,and Z Θ is the normalizer (3). Different clustering
models fall into this exponential form:
If x i and μ y i are vectors in Euclidean space, and D is the square of the
L 2 distance parameterized by a positive semidefinite weight matrix A ,
D ( x i y i )=
2
A , then the cluster conditional probability is a
d -dimensional multivariate normal density with covariance matrix A 1 :
p ( x i |
x i
μ y i
2
A )(30);
1
(2 π ) d/ 2 |A| 1 / 2
1
y i , Θ) =
exp(
2 (
x i
μ y i
If x i and μ y i are probability distributions, and D is KL-divergence
( D ( x i y i )= d m =1 x im log
x im
μ y i m ), then the cluster conditional prob-
ability is a multinomial distribution (20).
The relation in Equation 7.19 holds even if D is not a Bregman divergence
but a directional distance measure such as cosine distance, which is useful in
text clustering. Then, if x i and μ y i are vectors of unit length and D is one
minus the dot-product of the vectors D ( x i y i )=1
,then
the cluster conditional probability is a von-Mises Fisher (vMF) distribution
with unit concentration parameter (2), which is the spherical analog of a
Gaussian.
Putting Equation 7.19 into Equation 7.16 and taking logarithms gives the
following cluster objective function, minimizing which is equivalent to maxi-
mizing the joint probability over the HMRF in Equation 7.16:
P d m =1
x im μ y i m
x i
μ y i
Search WWH ::




Custom Search