Discovery of Driving Behavior Patterns - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

Fig. 12.8 Given a set of time series with previously unknown patterns, we aim to cluster the data

and find a representative (highlighted) for each group

Minimum Number of Observations. Depending on the application, the user can

optionally reduce the size of the dataset by specifying the minimum length of the

time series which should be consider for further processing.

Data Reduction Rate. Since the computational complexity of our distance calcu-

lations is quadratic in the length of the time series, we offer the possibility to

reduce the length via piecewise aggregate approximation [ 4 ]. Given a time series

of length n and a reduction rate r , the approximate time series is of length n

r .

Minimum Pattern Length. As described in Sect. 12.9 , the predeterminedminimum

pattern length l min directly influences the time series similarity. This parameter

strongly depends on the application and needs to be chosen by a domain expert.

Variable Selection. In case of time series datasets with multiple dimensions, the

user interface of our tool offers the possibility to select the variables that should

be considered for further analysis.

Similarity Threshold. This parameter is usually very sensitive and directly influ-

ences the clustering result. Since it may be challenging to determine an appropri-

ate similarity threshold

/

for each variable, our tool can alternatively recommend

(estimated) thresholds.

Parallel Computing. Calculating the distance matrix is costly for large datasets.

However, this step is fully parallelized and runs almost n CPU -times faster than

serial processing. Up to 12 parallel workers are supported.

Quality Control. Our tool presents a colored plot of the computed distance matrix

and a histogram of the distance distribution in order to ensure appropriate parame-

ter settings as well as clusters that preserve the time series characteristics. Since

both plots are updated iteratively during distance calculations, we can abort com-

putation anytime the preview suggests undesired results. For the distance matrix,

a high variance in the distances/colors indicates an appropriate parameter setting,

and a low variance in the distances/colors may result in poor clustering. In general,

Search WWH ::

Custom Search

Home