Data Reduction - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

d rs =

Nb rr +

d rs =

2 NT

When we define

N 2

d 2

d rs ,

d r · =

d rs ,

d 2

d rs

s =

·· =

and using the first equation, we get

2 (

d r · +

d 2

d rs )

b rs =

s −

·· −

XX T , we look for an approxi-

mation. We know from the spectral decomposition that X

Having now calculated b rs and knowing that B

CD 1 / 2 can be used as an

approximation for X , where C is the matrix whose columns are the eigenvectors of B

and D 1 / 2 is a diagonal matrix with square roots of the eigenvalues on the diagonals.

Looking at the eigenvalues of B we decide on a dimensionality k lower than that of

d . Let us say c j are the eigenvectors with

λ j as the corresponding eigenvalues. Note

that c j is N -dimensional. Then we get the new dimension as

z t j

λ j c t j ,

,...,

That is, the new coordinates of instance t are given by the t th elements of the

eigenvectors, c j , j

k , after normalization.

In [ 5 ], it has been shown that the eigenvalues of XX T

,...,

(

)

are the same as those

of X T X

and the eigenvectors are related by a simple linear transformation.

This shows that PCA does the same work with MDS and does it more easily.

In the general case, we want to find a mapping z

(

)

k ,

(

| θ)

, where z

∈ R

d , and g

∈ R

(

| θ)

is the mapping function from d to k dimensions defined up to a

set of parameters

. Classical MDS we discussed previously corresponds to a linear

transformation

W T x

(

) =

but in a general case, nonlinear mapping can also be used: this is called Sammon map-

ping . the normalized error in mapping is called the Sammon stress and is defined as

z r

z s

x r

x s

( ||

−

|| − ||

−

|| )

(θ |

) =

x r

−

x s

x r

x s

x r

x s

( ||

(

| θ) −

(

| θ) || − ||

−

)

x r

−

x s

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home