Digital Signal Processing Reference
In-Depth Information
Directions of Maximal Variance
Originally, PCA was formulated as a dimension reduction technique. In
its simplest form, it tries to iteratively determine the most “interest-
ing” signal component in the data, and then continue the search in the
complement of this component. For any such dimension reduction or de-
flation technique, we need to specify how to differentiate between signal
and noise in this projection. In PCA, this is achieved by considering data
to be interesting if it has high variance.
Note that from here on, for simplicity we specify random vectors as
lowercase letters. Given a random vector x
n with existing
covariance, we first center it and may then assume E ( x )=0.The
projection is defined as follows:
→ R
f : S n−1
n
⊂ R
−→ R
(3.2)
var( w x ) ,
w
−→
where
S n−1 :=
n
{
w
∈ R
||
w
|
=1
}
= i w i 1/2
n ,and
denotes the ( n
1)-dimensional unit sphere in
R
|
w
|
denotes the Euclidean norm .
Without the restriction to unit norm, maximization of f would be
ill-posed, so clearly such a constraint is necessary. The first principal
component of x is now defined as the random variable
y 1 := w 1 x =
i
( w 1 ) i x i
generated by projecting x along a global maximum w 1 of f .
The function f may, for instance, be maximized by a local algo-
rithm, such as gradient ascent constrained on the unit sphere (e.g. by
normalization of w after each update).
A second principal component y 2 is calculated by assuming that the
projection w 2 also maximizes f , but at the same time y 2 is decorrelated
from y 1 ,so E ( y 1 y 2 ) = 0 (note that the y i are centered because x is
centered). Iteratively, we can determine principal components y i .Such
an iterative projection method is called deflation and will be studied
in more detail for a different projection in the setting of ICA (see
section 4.5).
Search WWH ::




Custom Search