Database Reference
In-Depth Information
Problem definition
Given a collection of
n
co-evolving, semi-infinite
streams, producing a value
x
t,j
, for every stream 1
n
and for
every time-tick
t
=1
,
2
,...
, SPIRIT does the following: (i) Adapts the
number
k
of
hidden variables
necessary to explain/summarise the main
trends in the collection. (ii) Adapts the
participation weights
w
i,j
of the
j
-th stream on the
i
-th hidden variable (1
≤
j
≤
k
), so as
to produce an accurate summary of the stream collection. (iii) Monitors
the hidden variables
y
t,i
,for1
≤ i ≤ k
. (iv) Keeps updating all the
above eciently.
More precisely, SPIRIT operates on the column-vectors of observed
stream values
x
t
≡
≤
j
≤
n
and 1
≤
i
≤
[
x
t,
1
,...,x
t,n
]
T
and continually updates the par-
ticipation weights
w
i,j
.The
participation weight vector
w
i
for the
i
-
th principal direction is
w
i
:= [
w
i,
1
···
w
i,n
]
T
. The hidden variables
[
y
t,
1
,...,y
t,k
]
T
are the projections of
x
t
onto each
w
i
,overtime
(see
Table 5.1
), i.e.,
y
t
≡
y
t,i
:=
w
i,
1
x
t,
1
+
w
i,
2
x
t,
2
+
···
+
w
i,n
x
t,n
,
SPIRIT also adapts the number
k
of hidden variables necessary to cap-
ture most of the information. The adaptation is performed so that the
approximation achieves a desired mean-square error. In particular, let
x
t
=[
x
t,
1
···
x
t,n
]
T
be the
reconstruction
of
x
t
, based on the weights
and hidden variables, defined by
x
t,j
:=
w
1
,j
y
t,
1
+
w
2
,j
y
t,
2
+
···
+
w
k,j
y
t,k
,
or more succinctly,
x
t
=
i
=1
y
i,t
w
i
.
Inthechlorineexample,
x
t
is the
n
-dimensional column-vector of
the original sensor measurements and
y
t
is the hidden variable column-
vector, both at time
t
. The dimension of
y
t
is 1 before/after the leak
(
t<
1500 or
t>
3000) and 2 during the leak (1500
≤
t
≤
3000), as
shownin
Figure5.1
.
Definition 5.4 (SPIRIT Tracking)
SPIRIT updates the participa-
tion weights
w
i,j
so as to guarantee that the reconstruction error
x
t
−
2
over time
is predictably small.
x
t
This informal definition describes what SPIRIT does. The precise cri-
teria regarding the reconstruction error will be explained later. If we
assume that the
x
t
are drawn according to some distribution that does
not change over time (i.e., under
stationarity
assumptions), then the
weight vectors
w
i
converge to the principal directions. However, even if
there are non-stationarities in the data (i.e., gradual drift), in practice
we can deal with these very effectively, as we explain later.