Biomedical Engineering Reference
In-Depth Information
the principal components are less redundant than the original observations.
In Sect. 3.3 we will see that ICA extends this statistical independence property to
orders higher than two.
3.2.2.3
PCA as a Data Compression Technique
Another interesting property of PCA, closely related to the previous one, is its
data reduction or approximation capabilities., i.e., its ability to construct simple
representations of the available data with reduced dimensionality. Indeed, PCA
solves an important problem in signal processing that can be stated as follows. Let us
consider the linear component of Eq. ( 3.3 ), where w ∈R L is an unknown vector of
linear combination coefficients. The best approximation of this signal to the original
data can be computed by minimizing the mean square error (MSE)
2
Ψ MSE ( w , h )=E
{ x h z
},
(3.7)
where h ∈R L is an unknown vector allowing the projection of z back onto the
original L -dimensional observation space and
denotes the 2 -norm of its vector
argument. Note that Ψ MSE is also a function of vector w through relationship ( 3.3 ).
To find the optimal values of w and h , we must cancel the gradient of Eq. ( 3.7 ) with
respect to both vectors, leading to the equalities
·
2
w = h / h
,
(3.8)
T
h = R x w / ( w
R x w ) ,
(3.9)
wherewehaveassumedthat R x is full rank. We set
=1 to fix the scale
ambiguity in Eq. ( 3.7 ), since a scale factor can be exchanged between w and h
without altering the MSE. Combining Eqs. ( 3.8 )and( 3.9 ) proves that the optimal w
and h are identical, and equal to the dominant eigenvector of R x , i.e., w = h = u 1 .
As we have seen in Sect. 3.2.2.1 , this eigenvector is also the dominant principal
direction of the observed data, w 1 , so that signal z in Eq. ( 3.3 ) turns out to be
the dominant principal component of x , i.e., the entry z 1 of vector z in Eq. ( 3.6 ).
Additional algebraic manipulations show that the MSE of this optimal rank-1
w
approximation is i =2 λ i ,where λ i denotes the i th eigenvalue of R x . The second
principal direction w 2 is computed by repeating the above procedure on the error
vector or residual data ( x w 1 z 1 ) , and so forth. This minimum-error derivation
of PCA proves, as a by-product, that the second principal direction w 2 must lie
orthogonal to w 1 , since the error vector is indeed orthogonal to w 1 :
1 ( x w 1 z 1 )= w
1 ( x w 1 w
1 x )= w
1 x w 1
2
1 x =0 .
w
w
The minimum-variance derivation of the previous section imposed this orthogonal-
ity property to avoid extracting the same principal component twice.
Search WWH ::




Custom Search