Information Technology Reference
In-Depth Information
(a) (b)
Fig. 3.16.
Non-negative matrix factorization: (a) architecture of the network; (b) reconstruc-
tion of a face from its hidden representation; shown are also the extracted basis vectors and
the activities of the hidden units (images from [137]).
Non-negative Matrix Factorization.
Lee and Seung [137] recently introduced a
generative data model that can be interpreted in terms of vertical feedback. They
decompose a
n
×
m
non-negative matrix
V
approximately into non-negative matrix
factors:
V
≈
WH
. The
m
columns of
V
consist of
n
-dimensional data vectors.
The
r
columns of
W
contain basis vectors of dimension
n
. Each data vector is
represented by a column of
H
that contains
r
coefficients. This corresponds to a
two-layered neural network which represents the data vector in a visible layer and
the coefficients in a hidden layer. The matrix
W
describes the weights that connect
both layers. This situation is illustrated in Figure 3.16(a).
One measure of the factorization quality is the square of the Euclidean distance
k
A
−
B
k
2
=
P
ij
(
A
ij
−
B
ij
)
2
between
V
and its reconstruction
WH
.
k
V
−
WH
k
2
is minimized by the update rules:
(
W
T
V
)
aµ
(
W
T
WH
)
aµ
(
V H
T
)
ia
(
WHH
T
)
ia
H
aµ
←
H
aµ
;
W
ia
←
W
ia
.
P
ij
(
A
ij
log
A
ij
Another measure is the divergence
D
(
A
k
B
) =
−
A
ij
+
B
ij
)
.
B
ij
D
(
V
k
WH
)
is minimized by:
P
µ
H
aµ
V
iµ
/
(
WH
)
iµ
P
P
i
W
ia
V
iµ
/
(
WH
)
iµ
H
aµ
←
H
aµ
P
W
ia
←
W
ia
;
.
k
W
ka
ν
H
aν
Lee and Seung [138] proved that these update rules find local minima of the re-
spective objective functions. Each update consists of a multiplicative factor that is
unity if
V
=
WH
. The multiplicative update does not change the sign of
W
or
H
.
Hence, if they are initialized to positive values no further constraints are necessary
to enforce their non-negativity.
The model was applied to a dataset that contained 2,429 normalized faces. The
frontal views were hand-aligned in a 19
×
19 grid. Pixel intensities were linearly
scaled such that their mean and standard deviation equaled 0.25 and were then
clipped to the interval
[0
,
1]
, where a value of zero corresponds to white. The ma-
trices
W
and
H
were initialized randomly. The basis vectors that were present after
500 iterations of the update rule (minimizing the divergence) are shown in Fig-
ure 3.16(b). They consist of localized patches of high values that resemble typical
Search WWH ::
Custom Search