Decomposition in Transition: Adaptive Matrix Factorization - Realtime Data Mining

Database Reference

In-Depth Information

Due to nonnegativity of X,Y, large x ij 's indicate that the j th basis vector

essentially represents the value of the i th element of a j . Correspondingly, large

values of y ij 's indicate a major contribution of the i th basis vector to the element a j .

As compared with SVD, of course, we need to trade off the advantages against

the disadvantages: NMF typically exhibits a worse rank- k approximation than SVD.

Moreover - as opposed to SVD - the approximation is not unique. While the left

singular vectors of SVD are mutually orthogonal, this does not hold for the basis

vectors x i of NMF in general.

A major drawback is the lack of efficient computational methods for NMF that

are suitable for high-dimensional problems. While there are technically mature

standard procedures for SVD computation, like the Lanczos method, as well as

adaptive procedures like that of Brand's, and their convergence is proven, methods

for NMF are still in their infancy and poorly understood.

Even though the problem ( 8.28 ) is convex in each of X and Y, it is not convex in

both variables taken together, which renders determining a globally optimal

solution difficult. For most existing algorithms, convergence to a locally optimal

solution has been shown at best.

Most NMF optimization procedures follow the EM ( expectation - maximization )

principle in alternatingly fixing one of the variables and obtaining a new iterate by

optimizing with respect to the other variable.

A typical procedure is the ALS algorithm ( alternating least squares ), which is

considered to be an important practical tool for computation of an NMF. It should

be noted that we may introduce a column normalization immediately after step 6, so

as to ensure that the column sums up to 1. With regard to the case of a factorization

of transition probabilities discussed in the previous section, we thus automatically

fulfill the conditions ( 8.26 ) and ( 8.27 ). Hence stochasticity of the factorized matrix

is preserved.

Algorithm 8.3 ALS

mn

Input: positive matrix A

∈R

mk

kn

Output: positive factor matrices X

∈R

,

Y

∈R

1: initialize X with small random values

2: repeat

3: Y : ¼ ( X T X ) 1 X T A ¼ X + A

4: Y : ¼ [ Y ] +

5: X : ¼ AY T ( YY T ) 1

¼ AY +

6: X : ¼ [ X ] +

7: until convergence

Despite its good practical performance, we cannot guarantee even local conver-

gence of ALS. There is a vast variety of further algorithms for NMF. At present,

researchers are working flat out to make these methods suitable for high-

dimensional problems. The results, however, are still hard to assess.

Realtime Data Mining

Search WWH ::

Custom Search

Home