Database Reference
In-Depth Information
7.5 Learning Distance Function with Constraints
In this section, we will discuss two different approaches of using constraints
for distance metric learning in constrained clustering, both of which can clus-
ter text data using Euclidean distance on L 2 -normalized text data.
7.5.1 Generalized Mahalanobis Distance Learning
Xing et al. (48) proposed a formulation for learning a parameterized Maha-
lanobis metric of the form d ( x 1 ,x 2 )= ( x 1
x 2 ) T A ( x 1
x 2 ) from must-link
and cannot-link constraints. They proposed the following semi-definite pro-
gram (SDP) for the problem:
2 A =mi A
x j ) T A ( x i
mi A
ML ||
x i
x j ||
( x i
x j ) (7.11)
( x i ,x j )
( x i ,x j )
ML
s.t.,
( x i ,x j ) ∈CL ||
x i
x j || A
1 ,A
0
Equation 7.11 learns A such that the must-link instances are brought closer
together, while ensuring that the cannot-link instances are kept apart (SDP
constraint on CL set) and the underlying metric still satisfies the triangle
inequality (SDP constraint on A). Xing et al. (48) proposed an equivalent
formulation of Equation 7.11:
g ( A )=
max
A
( x i ,x j ) ∈CL ||x i ,x j || A
(7.12)
s.t., f ( A )=
( x i ,x j )
2 A A
ML ||
x i ,x j ||
1
C 1
(7.13)
A
0
C 2
Xing et al. (48) optimized Equation 7.12 using an alternate maximization
algorithm, that had 2 steps: (1) gradient ascent - to optimize the objective;
(2) iterated projection algorithm - to satisfy the inequality constraints. De
Bie et al. (7) used a variant of Linear Discriminant Analysis (LDA) to find
the Mahalanobis metric from constraints more eciently than using an SDP.
Experiments in both these papers showed that doing clustering with a distance
metric learned from constraints gave improved performance over clustering
without distance metric learning.
 
Search WWH ::




Custom Search