Information Technology Reference
In-Depth Information
(a) (b)
Fig. 3.16. Non-negative matrix factorization: (a) architecture of the network; (b) reconstruc-
tion of a face from its hidden representation; shown are also the extracted basis vectors and
the activities of the hidden units (images from [137]).
Non-negative Matrix Factorization. Lee and Seung [137] recently introduced a
generative data model that can be interpreted in terms of vertical feedback. They
decompose a n × m non-negative matrix V approximately into non-negative matrix
factors: V WH . The m columns of V consist of n -dimensional data vectors.
The r columns of W contain basis vectors of dimension n . Each data vector is
represented by a column of H that contains r coefficients. This corresponds to a
two-layered neural network which represents the data vector in a visible layer and
the coefficients in a hidden layer. The matrix W describes the weights that connect
both layers. This situation is illustrated in Figure 3.16(a).
One measure of the factorization quality is the square of the Euclidean distance
k A B k 2 =
P
ij ( A ij B ij ) 2 between V and its reconstruction WH . k V WH k 2
is minimized by the update rules:
( W T V )
( W T WH )
( V H T ) ia
( WHH T ) ia
H H
;
W ia W ia
.
P
ij ( A ij log A ij
Another measure is the divergence D ( A k B ) =
A ij + B ij ) .
B ij
D ( V k WH ) is minimized by:
P
µ H V / ( WH )
P
P
i W ia V / ( WH )
H H
P
W ia W ia
;
.
k W ka
ν H
Lee and Seung [138] proved that these update rules find local minima of the re-
spective objective functions. Each update consists of a multiplicative factor that is
unity if V = WH . The multiplicative update does not change the sign of W or H .
Hence, if they are initialized to positive values no further constraints are necessary
to enforce their non-negativity.
The model was applied to a dataset that contained 2,429 normalized faces. The
frontal views were hand-aligned in a 19 × 19 grid. Pixel intensities were linearly
scaled such that their mean and standard deviation equaled 0.25 and were then
clipped to the interval [0 , 1] , where a value of zero corresponds to white. The ma-
trices W and H were initialized randomly. The basis vectors that were present after
500 iterations of the update rule (minimizing the divergence) are shown in Fig-
ure 3.16(b). They consist of localized patches of high values that resemble typical
Search WWH ::




Custom Search