Database Reference
In-Depth Information
First pattern (basis)
Second pattern (basis)
Automobile
0.06
0.06
2000
0.04
0.04
0.02
0.02
1500
0
0
1000
−0.02
−0.02
500
−0.04
−0.04
0
−0.06
−0.06
1
2
3
4
5
1000 2000 3000 4000
1000
2000
3000
4000
Time
x 10 4
Time (1..4000)
Time (1..4000)
(a) Automobile trac data (aggregate (b) Representative patterns
counts from a west coast interstate) (batch, non-hier.), 5.4% error.
Fir st pattern (hierarchical, streamin g)
Sec ond pattern (hierarchical, strea ming)
First basi s (DCT)
Second ba s is (DCT)
Third bas i s (DCT)
0.04
0.04
0.02
0.02
0.02
0.02
0.02
0.01
0.01
0.01
0
0
0
0
0
−0.02
−0.02
−0.01
−0.01
−0.01
−0.04
−0.04
−0.02
−0.02
−0.02
1000
2000
3000
1000
2000
3000
0
2000
4000
0
2000
4000
0
2000
4000
Time (1..3645)
Time (1..3645)
Time (1..4000)
Time (1..4000)
Time (1..4000)
(c) Fixed bases (DCT) with highest coecients,
(d) Representative patterns
7.7% error (8.4% with two coecients).
(streaming, hierarchical).
Figure 5.2. Automobile trac, best selected window (about 1 day) and correspond-
ing representative patterns.
exception of wavelets, most fixed-basis schemes cannot be easily used
to capture information at arbitrary time scales. Also, any fixed-basis
scheme (e.g., wavelets, Fourier, etc) would produce similar results which
are heavily biased towards the shape of the apriori chosen bases or
approximating functions. On the other hand, when bases are discovered
from the data, we need additional storage space to explicitly represent
them, which is not necessary when the bases are given.
In general, collections of semi-infinite, time-evolving streams can be
modeled as values organized along several “dimensions 1 ”. One “dimen-
sion” corresponds to different streams in the collection. We first start by
describing techniques that apply in this case. Time is another “dimen-
sion,” that is somewhat special since it has an inherent ordering; we see
how techniques for cross-stream analysis can be adapted for “cross-time”
analysis.
We should emphasize that dimensionality reduction, filtering, and
forecasting on time series data has been broadly studied in several dis-
ciplines. However, in this chapter we focus specifically on work in the
context of data mining and knowledge discovery, with a special emphasis
on streams and sensor data.
1 Here “dimension” does not have the typical meaning in the linear algebraic sense.
 
 
Search WWH ::




Custom Search