Decomposition in Transition: Adaptive Matrix Factorization - Realtime Data Mining

Database Reference

In-Depth Information

Fig. 8.1 The best

approximating

one-dimensional subspace

( solid line ) to a set of data

residing in

u 2

u 3

3 . Projections

are indicated by dotted lines

u 4

u 1

This manifold is chosen such that the mean-squared error resulting from the

projection is minimal among all possible choices. Mathematically, the problem

may be stated as follows:

n s

2 ,

min

a j Xy j b

ð 8

6 Þ

, ... ,

n p d

n p

X ∈R

b ∈R

y 1

y n s ∈R

j¼ 1

n p denote the given data. A straightforward argument

reveals that b is always given by the centroid of the data, i.e., b

where a 1 ,

...

, a n s ∈ R

:¼ n s X n s

j¼ 1 a j .

Hence, assuming without loss of generality that the data are mean centered (which

may always be achieved by replacing our data by a j b ,

, n s Þ , the

translation b may always be taken to be 0. We may thus restrict ourselves to the

problem of finding the best approximating subspace to a set of mean-centered data:

j ¼ 1,

...

n s

min

a j Xy j

ð 8

7 Þ

, ... ,

n p d

X ∈R

y 1

y n s ∈R

j¼ 1

The Frobenius norm is defined as

m , n

:¼ X

2 , A

a ij

∈ R

i¼ 1 , j¼ 1

Summarizing our data and intrinsic variables in matrices,

:¼ a 1 ; ...;

a n s

, Y

:¼ y 1 ; ...;

y n s

we may cast ( 8.7 ) equivalently as the matrix factorization problem

min

A XY

ð 8

8 Þ

n p d

dn s

X ∈R

Y ∈R

Recalling the general framework stipulated in ( 8.1 ), ( 8.8 ) may be stated in terms

of the former by assigning fE

F , C 1 :¼ R

n p d , C 2 :¼ R

dn s

;

Þ :¼

E F

Realtime Data Mining

Search WWH ::

Custom Search

Home