Discovery of Driving Behavior Patterns - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

positions. The authors demonstrate that order invariance is an important consideration

for domains such as automotive engineering and smart home environments [ 33 , 35 ],

where multiple sensors observe contextual patterns in their naturally occurring order,

and time series are compared according the occurrence of these multivariate patterns.

Evaluation Criterion . Evaluation criteria for clustering are distinguished between

known ground truth and unknown ground truth [ 14 ]. In case of known ground truth,

the similarity between known clusters and obtained clusters can be measured. The

most commonly used clustering quality measure for known ground truth is the Rand

Index or minor variants of it [ 40 ]. In contrast, without prior knowledge the clusters

are usually evaluated according their within-cluster similarity and between-cluster

dissimilarity [ 14 ]. Various validity indices have been proposed to determine the

number of clusters and their goodness. For instance, the index I has been found to

be consistent and reliable, irrespective of the underlying clustering technique and

data dimensionality, and furthermore has been shown to outperform the Dunn and

David-Bouldin index [ 24 ].

Realistic Assumptions . The majority of publicly available time series datasets

were preprocessed and cleaned before publishing. For instance, the UCR archive [ 9 ]

contains only time series with equal length, which are mostly snippets of the origi-

nal data that were retrieved manually. The publication of perfectly aligned patterns

of equal length has lead to huge amount of time series classification and clustering

algorithms that are not able to deal with real-world data, which contains irrelevant

sections. Hu et al. [ 5 ] suggest to automatically build a data dictionary, which contains

only a small subset of the training data and neglects irrelevant sections and redun-

dancies. The evaluations show that using a data dictionary with a set of retrieved

subsequences for each class leads to higher classification accuracy and is several

time faster than the compared strawman algorithms. However, one needs to be care-

ful about how to retrieve subsequences, for reasons explained in the following.

Subsequence Clustering . Keogh and Lin [ 12 ] state that the clustering of time

series subsequences is meaningless, referring to the finding that the output does not

depend on input, and the resulting cluster centers are close to random ones. In almost

all cases the subsequences are extracted with a sliding window, which is assumed to

the quirk in clustering. To produce meaningful results the authors suggest to adopt

time seriesmotifs, a concept highly related to clusters. Their experiments demonstrate

that motif-based clustering is able to preserve the patterns found in the original time

series data [ 12 ].

Time Series Motifs . Motifs are previously unknown, frequently occurring

patterns, which are useful for various time series mining tasks: such as summa-

rization, visualization, clustering and classification of time series [ 2 , 16 ]. According

to the definition [ 16 ] a time series motif is a subsequence that comprises all non-

trivial matches within a given range. Since the naive (brute-force) approach to motif

discovery has quadratic complexity, Lin et al. [ 16 ] introduce a new motif discov-

ery algorithm that provides fast exact answers, and faster approximate answers,

achieving a speedup of one to two orders of magnitude. In order to reduce the num-

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home