Database Reference
In-Depth Information
more appealing than a sliding window, as the latter has explicit buffering
requirements.
7. An Application-driven View: Putting
Correlations to Work
We show how we can exploit the correlations and hidden variables to
do (a) forecasting, (b) missing value estimation, (c) summarization of
the large number of streams into a small, manageable number of hidden
variables, and (d) outlier detection. We will use SPIRIT because of its
hidden variable-based approach which makes it very convenient for such
tasks, though many of these tasks can also be accomplished by other
methods.
7.1 Forecasting and Missing Values
The hidden variables y t give us a much more compact representation
of the “raw” variables x t , with guarantees of high reconstruction accu-
racy (in terms of relative squared error, which is less than 1
f E ). When
our streams exhibit correlations, as we often expect to be the case, the
number k of the hidden variables is much smaller than the number n
of streams. Therefore, we can apply any forecasting algorithm to the
vector of hidden variables y t , instead of the raw data vector x t .This
reduces the time and space complexity by orders of magnitude, because
typical forecasting methods are quadratic or worse on the number of
variables.
In particular, we fit the forecasting model on the y t instead of x t .The
model provides an estimate y t +1 = f ( y t ) and we can use this to get an
estimate for
x t +1 := y t +1 , 1 w 1 [ t ]+
···
+ y t +1 , 1 w k [ t ] ,
using the weight estimates w i [ t ] from the previous time tick t .We
chose auto-regression for its intuitiveness and simplicity, but any online
method can be used.
Correlations Since the principal directions are orthogonal to one an-
other ( w i w j ,i = j ), the components of y t are by construction uncor-
related —the correlations have already been captured by the w i , 1 ≤ i ≤
k . We can take advantage of this de-correlation reduce forecasting com-
plexity. In particular for auto-regression, we found that one AR model
per hidden variable provides results comparable to multivariate AR.
Auto-regression Space complexity for multivariate AR (e.g., MUS-
CLES [58]) is O ( n 3 2 ), where is the auto-regression window length.
Search WWH ::




Custom Search