Information Technology Reference
In-Depth Information
12.3 Related Work
The main goal of clustering is to organize unlabeled data into homogeneous groups
that are clearly separated from each other. In general, clustering involves the cluster-
ing algorithm, the similarity or rather distance measure, and the evaluation criterion.
Clustering algorithms are categorized into partitioning, hierarchical, density-based,
grid-based, and model-based methods. All of these clustering algorithms can be
applied for static and temporal data [ 14 ]. In the following, we discuss important
considerations, common pitfalls, successful applications, and recent developments
in time series clustering.
Time Series Clustering . Unlike static data, temporal data evolves over time and
therefore requires special handling. One could either modify the existing clustering
algorithms to handle time series data or convert the time series into a form that
can be directly clustered. The former approach works with the raw time series, and
the major modification lies in replacing the distance/similarity measure. The latter
approach converts the raw time series either into feature vectors or model parameters,
and then applies conventional clustering algorithms. Thus, time series clustering
approaches can be categorized into raw-data-based, feature-based, and model-based
methods [ 14 ].
Time Series Representation . In this study, we mainly focus on clustering meth-
ods that work with raw data, in particular multivariate time series with same sample
rate. Clustering time series only differs from conventional clustering in how to com-
pute the similarity between data objects [ 14 ]. Therefore, the key is to understand the
unique characteristics of the time series and then to design an appropriate similar-
ity measure accordingly. For instance, Meesrikamolkul et al. [ 25 ] have proposed a
novel method which combines the widely used k-means clustering algorithmwith the
Dynamic Time Warping distance measure, instead of the traditional Euclidean dis-
tance, to study sequences with time shifts. Unlike before, the newmethod determines
cluster centers that preserve the characteristics of the data sequences.
Distance/Similarity Measures . Besides Euclidean distance and Dynamic Time
Warping distance, commonly used similarity measures include Minkowski distance,
Levenshtein distance, Short Time Series distance, Pearson correlation coefficient,
cross-correlation-based distances, probability-based distance functions, and many
others. The choice of similarity measure depends on whether the time series is
discrete-valued or real-valued, uniform or nonuniform sampled, univariate or multi-
variate, and whether the data sequences are of equal or unequal length [ 14 ].
Distortions and Invariance . Furthermore, the choice of the time series distance
measure depends on the invariance required by the domain. The literature [ 1 ] has
introduced techniques designed to efficiently measure similarity between time series
with invariance to (various combinations of) the distortions of warping, uniform scal-
ing, offset, amplitude scaling, phase, occlusions, uncertainty, andwandering baseline.
Recent work [ 32 ] has proposed an order-invariant distance which is able to deter-
mine the (dis)similarity of time series that exhibit similar subsequences at arbitrary
Search WWH ::




Custom Search