Database Reference
In-Depth Information
measurements [58], find routers that tend to go down together. In gen-
eral, the discovered correlations and hidden variables have multiple uses.
They provide a succinct summary to the user, they can help to do fast
forecasting and detect outliers, and they facilitate interpolations and
handling of missing values, as we discuss later.
After giving an illustrative example where correlations across many
streams arise, we consider the case of a single stream. Even in this
case correlations are present. These correlations arise across values of
the same stream at different times, instead across values from different
streams. Values at different times are typically not independent, due to,
for example, periodic or repeating patterns. These auto- correlations can
be leveraged in similar ways, to perform dimensionality reduction (com-
pression or filtering) across time. In fact, the problems of dimensionality
reduction, filtering, and forecasting are closely related, as we shall see.
For purposes of illustration, consider the following example series in
Figure 5.2a , which consists of automobile trac counts in a large, west
coast interstate. The data exhibit a clear daily periodicity. Also, in
each day there is another distinct pattern of morning and afternoon
rush hours. However, these peaks have distinctly different shapes: the
morning one is more spread out, the evening one more concentrated and
slightly sharper.
What we would ideally like to discover is: (i) The main trend in the
data repeats at a window (“period”) of approximately 4000 timestamps;
(ii) A succinct “description” of that main trend that captures most of
the recurrent information.
Figure 5.2b shows the output of a pattern discovery approach, based
on filtering techniques very similar to those used for cross-stream corre-
lations. These patterns indeed suggest that the “best” window is 4000
timestamps. Furthermore, the first pattern captures the average and the
second pattern correctly captures the two peaks and also their approx-
imate shape (the first one wide and the second narrower). For compar-
ison, in Figure 5.2d shows the output of a fast, streaming computation
scheme. In order to reduce the storage and computation requirements,
our fast scheme tries to filter out some of the “noise” earlier, while retain-
ing as many of the regularities as possible. However, which information
should be discarded and which should be retained is once again decided
based on the data itself . Thus, even though some information is un-
avoidably discarded, Figure 5.2b still correctly captures the main trends
(average level, peaks and their shape).
For comparison, Figure 5.2c shows the best “local patterns” obtained
using fixed bases. For illustration, we chose the Discrete Cosine Trans-
form (DCT) on the first window of 4000 points. First, with the notable
Search WWH ::




Custom Search