Information Technology Reference
In-Depth Information
minimum extent (the minimum time extent for considering an episode relevant)
for state TA, and the slope (the minimum allowed rate of change in an episode)
for trend TA.
Complex TA [6] can be defined as well: instead of aggregating events into
episodes, complex TA aggregate two series of episodes into a set of higher level
episodes (i.e., they abstract output intervals over precalculated input intervals).
In particular, complex abstractions search for specific temporal relationships be-
tween episodes that can be generated from a basic abstraction or from other
complex abstractions. The relation between intervals can be any of the tempo-
ral relations defined by Allen [3]. This kind of TA can be exploited to extract
patterns that depend on the course of several features, or to detect patterns of
complex shapes (e.g. a peak) in a single feature.
If the time series has been pre-processed through TA, similarity based retrieval
can benefit from the use of pattern matching techniques. Sequence matching can
in fact be performed by a number of well-established methods [49] like dynamic
programming based on the edit distance approach [52], sux tree-based ap-
proaches [53], or general formal transformations of patterns [20]. For example
the framework in [20] defines similarity between a pattern A and a pattern B
(in a formal pattern language P ) as a function of the transformations (defined
on a transformation language T ) needed to reduce B to A (or vice versa). The
approach allows one to answer also queries such as “find all patterns similar to
some pattern A , but not similar to pattern B ”. With symbolic time series it is
also easier to apply data mining and knowledge discovery methods, which can
be helpful to find non trivial knowledge patterns in the abstracted data [14].
Observe that TA are not the only methodology for transforming a time series
into a sequence of symbols. Actually a wide number of symbolic representations
of time series have been introduced in the past decades (see [11] for a survey).
However, some of them require a strong domain knowledge, since they a priori
partition the signal into intervals, naturally provided by the underlying system
dynamics, which divide the overall time period into distinct physical phases (e.g.
respiration cycles in [14]; see also [21]). Many other approaches to symbolizations
are weakened by other relevant issues, i.e.: even if distance measures can be de-
fined on symbolic representations, these distance measures have little correlation
with distance measures defined over the original time series; moreover, the con-
version to symbolic representation requires to have access to all the data since
the beginning, thus making it not exploitable in a context of data streaming.
Rather interestingly, Lin [26] has introduced an alternative to TA, capable to
deal with the issues above, in which intervals are first obtained through PCA, and
subsequently labeled with proper symbols. In particular this contribution allows
distance measures to be defined on the symbolic approach that lower bound the
corresponding distance measures defined on the original data. Such a feature
permits to run some well known data mining algorithms on the transformed
data, obtaining identical results with respect to operating on the original data,
while gaining in eciency.
 
Search WWH ::




Custom Search