Databases Reference
In-Depth Information
Similarity Search in Time-Series Data
A time-series data set consists of sequences of numeric values obtained over repeated
measurements of time. The values are typically measured at equal time intervals (e.g.,
every minute, hour, or day). Time-series databases are popular in many applications
such as stock market analysis, economic and sales forecasting, budgetary analysis, util-
ity studies, inventory studies, yield projections, workload projections, and process and
quality control. They are also useful for studying natural phenomena (e.g., atmosphere,
temperature, wind, earthquake), scientific and engineering experiments, and medical
treatments.
Unlike normal database queries, which find data that match a given query exactly ,
a similarity search finds data sequences that differ only slightly from the given query
sequence. Many time-series similarity queries require subsequence matching , that is,
finding a set of sequences that contain subsequences that are similar to a given query
sequence.
For similarity search, it is often necessary to first perform data or dimensionality
reduction and transformation of time-series data. Typical dimensionality reduction tech-
niques include (1) the discrete Fourier transform ( DFT ), (2) discrete wavelet transforms
(DWT) , and (3) singular value decomposition ( SVD ) based on principle components anal-
ysis ( PCA ). Because we touched on these concepts in Chapter 3, and because a thorough
explanation is beyond the scope of this topic, we will not go into great detail here. With
such techniques, the data or signal is mapped to a signal in a transformed space . A small
subset of the “strongest” transformed coefficients are saved as features.
These features form a feature space , which is a projection of the transformed space.
Indices can be constructed on the original or transformed time-series data to speed
up a search. For a query-based similarity search, techniques include normalization
transformation, atomic matching (i.e., finding pairs of gap-free windows of a small
length that are similar), window stitching (i.e., stitching similar windows to form pairs
of large similar subsequences, allowing gaps between atomic matches), and subse-
quence ordering (i.e., linearly ordering the subsequence matches to determine whether
enough similar pieces exist). Numerous software packages exist for a similarity search in
time-series data.
Recently, researchers have proposed transforming time-series data into piecewise
aggregate approximations so that the data can be viewed as a sequence of symbolic rep-
resentations. The problem of similarity search is then transformed into one of matching
subsequences in symbolic sequence data. We can identify motifs (i.e., frequently occur-
ring sequential patterns) and build index or hashing mechanisms for an efficient search
based on such motifs. Experiments show this approach is fast and simple, and has
comparable search quality to that of DFT, DWT, and other dimensionality reduction
methods.
Regression and Trend Analysis in Time-Series Data
Regression analysis of time-series data has been studied substantially in the fields of
statistics and signal analysis. However, one may often need to go beyond pure regression
 
Search WWH ::




Custom Search