DIMENSIONALITY REDUCTION AND FILTERING ON TIME SERIES SENSOR STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

measurements [58], find routers that tend to go down together. In gen-

eral, the discovered correlations and hidden variables have multiple uses.

They provide a succinct summary to the user, they can help to do fast

forecasting and detect outliers, and they facilitate interpolations and

handling of missing values, as we discuss later.

After giving an illustrative example where correlations across many

streams arise, we consider the case of a single stream. Even in this

case correlations are present. These correlations arise across values of

the same stream at different times, instead across values from different

streams. Values at different times are typically not independent, due to,

for example, periodic or repeating patterns. These auto- correlations can

be leveraged in similar ways, to perform dimensionality reduction (com-

pression or filtering) across time. In fact, the problems of dimensionality

reduction, filtering, and forecasting are closely related, as we shall see.

For purposes of illustration, consider the following example series in

Figure 5.2a , which consists of automobile trac counts in a large, west

coast interstate. The data exhibit a clear daily periodicity. Also, in

each day there is another distinct pattern of morning and afternoon

rush hours. However, these peaks have distinctly different shapes: the

morning one is more spread out, the evening one more concentrated and

slightly sharper.

What we would ideally like to discover is: (i) The main trend in the

data repeats at a window (“period”) of approximately 4000 timestamps;

(ii) A succinct “description” of that main trend that captures most of

the recurrent information.

Figure 5.2b shows the output of a pattern discovery approach, based

on filtering techniques very similar to those used for cross-stream corre-

lations. These patterns indeed suggest that the “best” window is 4000

timestamps. Furthermore, the first pattern captures the average and the

second pattern correctly captures the two peaks and also their approx-

imate shape (the first one wide and the second narrower). For compar-

ison, in Figure 5.2d shows the output of a fast, streaming computation

scheme. In order to reduce the storage and computation requirements,

our fast scheme tries to filter out some of the “noise” earlier, while retain-

ing as many of the regularities as possible. However, which information

should be discarded and which should be retained is once again decided

based on the data itself . Thus, even though some information is un-

avoidably discarded, Figure 5.2b still correctly captures the main trends

(average level, peaks and their shape).

For comparison, Figure 5.2c shows the best “local patterns” obtained

using fixed bases. For illustration, we chose the Discrete Cosine Trans-

form (DCT) on the first window of 4000 points. First, with the notable

Search WWH ::

Custom Search

Home