Discovery of Driving Behavior Patterns - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

12.3 Related Work

The main goal of clustering is to organize unlabeled data into homogeneous groups

that are clearly separated from each other. In general, clustering involves the cluster-

ing algorithm, the similarity or rather distance measure, and the evaluation criterion.

Clustering algorithms are categorized into partitioning, hierarchical, density-based,

grid-based, and model-based methods. All of these clustering algorithms can be

applied for static and temporal data [ 14 ]. In the following, we discuss important

considerations, common pitfalls, successful applications, and recent developments

in time series clustering.

Time Series Clustering . Unlike static data, temporal data evolves over time and

therefore requires special handling. One could either modify the existing clustering

algorithms to handle time series data or convert the time series into a form that

can be directly clustered. The former approach works with the raw time series, and

the major modification lies in replacing the distance/similarity measure. The latter

approach converts the raw time series either into feature vectors or model parameters,

and then applies conventional clustering algorithms. Thus, time series clustering

approaches can be categorized into raw-data-based, feature-based, and model-based

methods [ 14 ].

Time Series Representation . In this study, we mainly focus on clustering meth-

ods that work with raw data, in particular multivariate time series with same sample

rate. Clustering time series only differs from conventional clustering in how to com-

pute the similarity between data objects [ 14 ]. Therefore, the key is to understand the

unique characteristics of the time series and then to design an appropriate similar-

ity measure accordingly. For instance, Meesrikamolkul et al. [ 25 ] have proposed a

novel method which combines the widely used k-means clustering algorithmwith the

Dynamic Time Warping distance measure, instead of the traditional Euclidean dis-

tance, to study sequences with time shifts. Unlike before, the newmethod determines

cluster centers that preserve the characteristics of the data sequences.

Distance/Similarity Measures . Besides Euclidean distance and Dynamic Time

Warping distance, commonly used similarity measures include Minkowski distance,

Levenshtein distance, Short Time Series distance, Pearson correlation coefficient,

cross-correlation-based distances, probability-based distance functions, and many

others. The choice of similarity measure depends on whether the time series is

discrete-valued or real-valued, uniform or nonuniform sampled, univariate or multi-

variate, and whether the data sequences are of equal or unequal length [ 14 ].

Distortions and Invariance . Furthermore, the choice of the time series distance

measure depends on the invariance required by the domain. The literature [ 1 ] has

introduced techniques designed to efficiently measure similarity between time series

with invariance to (various combinations of) the distortions of warping, uniform scal-

ing, offset, amplitude scaling, phase, occlusions, uncertainty, andwandering baseline.

Recent work [ 32 ] has proposed an order-invariant distance which is able to deter-

mine the (dis)similarity of time series that exhibit similar subsequences at arbitrary

Search WWH ::

Custom Search

Home