Database Reference
In-Depth Information
C 1 , ... , C k , C 1 , ... , C k X
X
k
2
2
min
j
j
x i c l
j
j
ð 8
:
38 Þ
1
i C l
r the
cluster centers to be minimized. It is straightforward to verify that if we are given
clusters and keep them fixed in the optimization, corresponding optimal cluster
centers are given by the centroids of the clusters. Conversely, given cluster centers
c l , l
where C 1 ,
...
, C k {1,
...
, m } denote the clusters and c 1 ,
...
, c k ∈
{1,
...
, k }, corresponding optimal cluster may be obtained by assigning
each x i , i
, m }, to its nearest neighbor among the c l . Hence, either the
clusters or the centers may be eliminated from the objective, giving rise to an
equivalent problem. A major drawback is that since ( 8.38 ) is not convex, we must
settle for a local optimization. Yet, as described above, given a specific choice of
either clusters or centers, the resulting partial optimization problem is straightfor-
ward to solve. This insight provides us with a fairly easy local optimization method
based on alternating partial optimization.
Interestingly enough, ( 8.38 ) may be equivalently rewritten as the matrix factor-
ization problem
{1,
...
X
k
2
F s t b ij
XCB T
min
f ,
0
;
b is ¼ 1 8i
8 1
f
;...;
m
g,
j
f
1
;...;
k
g:
,
rk
mk
C ∈
B ∈
1
ð 8
:
39 Þ
(By virtue of the Moore-Penrose inverse of B , it also becomes clear why partially
optimal cluster centers are given by the centroids of the clusters.)
Not only does this enable to express the entire procedure in terms of linear
algebra, but also, for the generic case that the desired number of clusters be smaller
than the rank of X , does it provide us with a clue as to how to find a reasonable
initial guess by virtue of relaxation: omitting the constraints yields
2
F ,
ml kX CB T
min
k
ð 8
:
40 Þ
,
rl
C ∈
B ∈
which is nothing but an ordinary low-rank factorization problem that may be solved
by means of SVD.
Now, given an optimizer C , B of ( 8.40 ), how are we to construct an initial guess
C 0 , B 0 for ( 8.39 )? A straightforward way that bears the advantage of circumventing
normalization issues consists of taking B 0 to a feasible matrix that best approxi-
mates B , which is obtained by taking each row of B 0 to be zero everywhere except
for the element at which the corresponding row of B attains its largest value (the
nongeneric case of a draw is handled by picking one of the possibilities at random),
where we assign the value 1.
Herein, the circumstance that in many of the application previously described in
this chapter, X has orthonormal rows comes in particularly handy since this spares
Search WWH ::




Custom Search