Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

(a) (b)

Fig. 3.16. Non-negative matrix factorization: (a) architecture of the network; (b) reconstruc-

tion of a face from its hidden representation; shown are also the extracted basis vectors and

the activities of the hidden units (images from [137]).

Non-negative Matrix Factorization. Lee and Seung [137] recently introduced a

generative data model that can be interpreted in terms of vertical feedback. They

decompose a n × m non-negative matrix V approximately into non-negative matrix

factors: V ≈ WH . The m columns of V consist of n -dimensional data vectors.

The r columns of W contain basis vectors of dimension n . Each data vector is

represented by a column of H that contains r coefficients. This corresponds to a

two-layered neural network which represents the data vector in a visible layer and

the coefficients in a hidden layer. The matrix W describes the weights that connect

both layers. This situation is illustrated in Figure 3.16(a).

One measure of the factorization quality is the square of the Euclidean distance

k A − B k 2 =

ij ( A ij − B ij ) 2 between V and its reconstruction WH . k V − WH k 2

is minimized by the update rules:

( W T V ) aµ

( W T WH ) aµ

( V H T ) ia

( WHH T ) ia

H aµ ← H aµ

;

W ia ← W ia

ij ( A ij log A ij

Another measure is the divergence D ( A k B ) =

− A ij + B ij ) .

B ij

D ( V k WH ) is minimized by:

µ H aµ V iµ / ( WH ) iµ

i W ia V iµ / ( WH ) iµ

H aµ ← H aµ

W ia ← W ia

;

k W ka

ν H aν

Lee and Seung [138] proved that these update rules find local minima of the re-

spective objective functions. Each update consists of a multiplicative factor that is

unity if V = WH . The multiplicative update does not change the sign of W or H .

Hence, if they are initialized to positive values no further constraints are necessary

to enforce their non-negativity.

The model was applied to a dataset that contained 2,429 normalized faces. The

frontal views were hand-aligned in a 19 × 19 grid. Pixel intensities were linearly

scaled such that their mean and standard deviation equaled 0.25 and were then

clipped to the interval [0 , 1] , where a value of zero corresponds to white. The ma-

trices W and H were initialized randomly. The basis vectors that were present after

500 iterations of the update rule (minimizing the divergence) are shown in Fig-

ure 3.16(b). They consist of localized patches of high values that resemble typical

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home