Information Technology Reference
In-Depth Information
the data matrix is
x 11 x 12
...
x 1 r
x 2 r
. . .... .
x N 1 x N 2 ...
x 21 x 22
...
x
=
(3.10)
x Nr
e i are the vectors generators of subspace V . The aim is to find a new basis vector
(
y 1 ,
y 2 ,...,
y m )
to define a new subspace containing the maximum information of
the actual data
y 1 =
w 11 x 1 +
w 12 x 2
··· +
w 1 r x r
y 2 =
w 21 x 1 +
w 22 x 2
··· +
w 2 r x r
(3.11)
.
y m =
w m 1 x 1 +
w m 2 x 2
··· +
w mr x r
where m
<
r .If
μ y =
E
(
y
)
is the the expected value of y , it demonstrates that
w T x
w T E
μ y =
E
(
) =
(
x
)
(3.12)
And the covariance matrix of y is equal to
T
w T C x W
C y =
E
{ (
y
μ y )(
y
μ y )
}=
(3.13)
to obtain the subspace with maximum variability in the data, we calculate the covari-
ance matrix of y ( C y ), imposing the orthonormality constraint on it:
w T w
=
I
(3.14)
We have to optimize:
w T C x w
w T w
λ(
I
)
(3.15)
differentiating and equating to zero
(
C x λ
I
)
w
=
0
(3.16)
The problem is reduced to calculating the eigenvectors of C x . Those associated
with the most significant eigenvalues, which components will form the subspace
where the data have most high variability (Fig. 3.2 ).
In order to choose the number of principal components, a criterion for choosing
the eigenvalues may be the use of an index of variability, defined as follows:
Search WWH ::




Custom Search