DIMENSIONALITY REDUCTION AND FILTERING ON TIME SERIES SENSOR STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

more appealing than a sliding window, as the latter has explicit buffering

requirements.

7. An Application-driven View: Putting

Correlations to Work

We show how we can exploit the correlations and hidden variables to

do (a) forecasting, (b) missing value estimation, (c) summarization of

the large number of streams into a small, manageable number of hidden

variables, and (d) outlier detection. We will use SPIRIT because of its

hidden variable-based approach which makes it very convenient for such

tasks, though many of these tasks can also be accomplished by other

methods.

7.1 Forecasting and Missing Values

The hidden variables y t give us a much more compact representation

of the “raw” variables x t , with guarantees of high reconstruction accu-

racy (in terms of relative squared error, which is less than 1

f E ). When

our streams exhibit correlations, as we often expect to be the case, the

number k of the hidden variables is much smaller than the number n

of streams. Therefore, we can apply any forecasting algorithm to the

vector of hidden variables y t , instead of the raw data vector x t .This

reduces the time and space complexity by orders of magnitude, because

typical forecasting methods are quadratic or worse on the number of

variables.

In particular, we fit the forecasting model on the y t instead of x t .The

model provides an estimate y t +1 = f ( y t ) and we can use this to get an

estimate for

−

x t +1 := y t +1 , 1 w 1 [ t ]+

···

+ y t +1 , 1 w k [ t ] ,

using the weight estimates w i [ t ] from the previous time tick t .We

chose auto-regression for its intuitiveness and simplicity, but any online

method can be used.

Correlations Since the principal directions are orthogonal to one an-

other ( w i ⊥ w j ,i = j ), the components of y t are by construction uncor-

related —the correlations have already been captured by the w i , 1 ≤ i ≤

k . We can take advantage of this de-correlation reduce forecasting com-

plexity. In particular for auto-regression, we found that one AR model

per hidden variable provides results comparable to multivariate AR.

Auto-regression Space complexity for multivariate AR (e.g., MUS-

CLES [58]) is O ( n 3 2 ), where is the auto-regression window length.

Search WWH ::

Custom Search

Home