Discovery of Driving Behavior Patterns - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

good clustering results can be achieved when the distances do not accumulate at

either end of the interval (all close to zero or one). Figure 12.9 a shows the quality

control for our sample dataset.

Clustering Validation. To support the user in choosing an optimal number of k

clusters or representatives, our tool validates the cluster goodness for changing

k according to three cluster validation indexes. Figure 12.9 b shows the cluster

validation for our sample dataset.

Cluster Distribution. The clustering may result in groups of different size. Our tool

illustrates the cluster distribution to identify outliers and emphasize prominent

groups with expressive representatives. For our sample dataset all clusters have

the same size, see Fig. 12.9 b.

List of Representatives. Since we aim at finding representatives, our tool does not

only show a list of identified candidates as illustrated in Fig. 12.9 b, but also allows

to visualize the time intervals or patterns that co-occur in other time series of the

same cluster, see Fig. 12.9 c.

Please note that we provide supplementary onlinematerial [ 36 ], which includes

our BestTime tool for finding time series representatives, real-life testing data,

a video demonstration, and a technical report.

12.11 Conclusion and Future Work

This work is a first attempt to solve time series clustering with nonlinear data analy-

sis and modeling techniques commonly used by theoretical physicists. We adopted

recurrence plots (RPs) and recurrence quantification analysis (RQA) to measure the

(dis)similarity of multivariate time series that contain segments of similar trajectories

at arbitrary positions and in different order.

Strictly speaking, we introduced the concept of joint cross recurrence plots

(JCRPs), a multivariate extension of traditional RPs, to visualize and investigate

recurring patterns in pairwise compared time series. Furthermore, we defined a recur-

rence plot-based (RRR) distance measure to cluster (multivariate) time series with

order invariance.

The proposed RRR distance was evaluated on both synthetic and real-life time

series, and compared with the DTW distance. Our evaluation on synthetic data

demonstrates that the RRR distance is able to establish cluster centers that preserve

the characteristics of the (univariate and multivariate) sample time series. The results

on real-life vehicular data show that, in terms of our cost function, RRR performs

about 10% better than DTW, meaning that the determined prototypes contain 10%

more recurring driving behavior patterns.

In addition, we have introduced BestTime, a Matlab tool, which implements our

RRR distance to find time series representatives that best comprehend the recurring

Search WWH ::

Custom Search

Home