Information Technology Reference
In-Depth Information
good clustering results can be achieved when the distances do not accumulate at
either end of the interval (all close to zero or one). Figure 12.9 a shows the quality
control for our sample dataset.
Clustering Validation. To support the user in choosing an optimal number of k
clusters or representatives, our tool validates the cluster goodness for changing
k according to three cluster validation indexes. Figure 12.9 b shows the cluster
validation for our sample dataset.
Cluster Distribution. The clustering may result in groups of different size. Our tool
illustrates the cluster distribution to identify outliers and emphasize prominent
groups with expressive representatives. For our sample dataset all clusters have
the same size, see Fig. 12.9 b.
List of Representatives. Since we aim at finding representatives, our tool does not
only show a list of identified candidates as illustrated in Fig. 12.9 b, but also allows
to visualize the time intervals or patterns that co-occur in other time series of the
same cluster, see Fig. 12.9 c.
Please note that we provide supplementary onlinematerial [ 36 ], which includes
our BestTime tool for finding time series representatives, real-life testing data,
a video demonstration, and a technical report.
12.11 Conclusion and Future Work
This work is a first attempt to solve time series clustering with nonlinear data analy-
sis and modeling techniques commonly used by theoretical physicists. We adopted
recurrence plots (RPs) and recurrence quantification analysis (RQA) to measure the
(dis)similarity of multivariate time series that contain segments of similar trajectories
at arbitrary positions and in different order.
Strictly speaking, we introduced the concept of joint cross recurrence plots
(JCRPs), a multivariate extension of traditional RPs, to visualize and investigate
recurring patterns in pairwise compared time series. Furthermore, we defined a recur-
rence plot-based (RRR) distance measure to cluster (multivariate) time series with
order invariance.
The proposed RRR distance was evaluated on both synthetic and real-life time
series, and compared with the DTW distance. Our evaluation on synthetic data
demonstrates that the RRR distance is able to establish cluster centers that preserve
the characteristics of the (univariate and multivariate) sample time series. The results
on real-life vehicular data show that, in terms of our cost function, RRR performs
about 10% better than DTW, meaning that the determined prototypes contain 10%
more recurring driving behavior patterns.
In addition, we have introduced BestTime, a Matlab tool, which implements our
RRR distance to find time series representatives that best comprehend the recurring
Search WWH ::




Custom Search