Database Reference
In-Depth Information
[
x
t,
1
...,x
t,n
]
T
,
t
=1
,
2
,...
, there exist correlations between the
n
dimensions (which
correspond to streams in our setting). These can be captured by prin-
cipal components analysis (PCA). Consider for example the setting in
Figure 5.3
. There is a visible linear correlation. Thus, if we represent
every point with its projection on the direction of
w
1
, the error of this
approximation is very small. In fact, the first principal direction
w
1
,is
the
optimal
in the following sense.
Typically, in collections of
n
-dimensional points
x
t
≡
Definition 5.1 (First principal component)
Given a collection of
n
-dimensional vectors
x
τ
n
,
τ
=1
,
2
,...,t
,the
first principal direc-
∈
R
n
is the vector minimizing the sum of squared residuals,
tion
w
1
∈
R
i.e.,
t
τ
=1
x
τ
−
(
ww
T
)
x
τ
2
.
w
1
:= arg min
w
=1
The projection of
x
τ
on
w
1
is the
first principal component (PC)
y
τ,
1
:=
w
1
x
τ
,
τ
=1
,...,t
.
Note that, since
=1,wehave(
w
1
w
1
)
x
τ
=(
w
1
x
τ
)
w
1
=
y
τ,
1
w
1
=:
x
τ
,where
x
τ
is the projection of
y
τ,
1
back into the original
n
-D space.
That is,
x
τ
is the
reconstruction
of the original measurements from the
first PC
y
τ,
1
. More generally, PCA will produce
k
vectors
w
1
,
w
2
,...,
w
k
such that, if we represent each
n
-D data point
x
t
:= [
x
t,
1
···
w
1
x
t,n
]with
its
k
-D projection
y
t
:= [
w
1
x
t
···
w
k
x
t
]
T
, then this representation min-
imises the squared error
τ
2
. Furthermore, the principal direc-
tions are orthogonal, so the principal components
y
τ,i
,
1
x
t
−
x
t
k
are
by
construction uncorrelated
, i.e., if
y
(
i
)
:= [
y
1
,i
,...,y
t,i
,...
]
T
is the stream
of the
i
-th principal component, then
y
(
i
)
T
y
(
j
)
=0if
i
=
j
.
≤
i
≤
Observation 3.1 (Dimensionality reduction)
If we represent each
n
-dimensional point
x
τ
∈
R
n
using all
n
principal components, then the
error
=0
. However, in typical datasets, we can achieve a very
small error using only
k
principal components, where
k
x
τ
−
x
τ
n
.
In the context of the chlorine example, each point in
Figure 5.3
would
correspond to the 2-D projection of
x
τ
(where 1
≤ τ ≤ t
)ontothe
first two principal directions,
w
1
and
w
2
, which are the most impor-
tant according to the distribution of
{
x
τ
|
1
≤ τ ≤ t}
. The principal
components
y
τ,
1
and
y
τ,
2
are the coordinates of these projections in the
orthogonal coordinate system defined by
w
1
and
w
2
.
However, batch methods for estimating the principal components re-
quire time that depends on the duration
t
, which grows to infinity. In
fact, the principal directions are the eigenvectors of
X
t
X
t
,whichare