Information Technology Reference
In-Depth Information
Time Series Classification with Temporal
Bag-of-Words Model
Zi-Wen Gui 1 and Yi-Ren Yeh 2
1 Department of Computer Science and Information Engineering
National Taiwan University of Science and Technology, Taipei, Taiwan
2 Department of Applied Mathematics
Chinese Culture University, Taipei, Taiwan
Abstract. Time series classification has attracted increasing attention
in machine learning and data mining. In the analysis of time series data,
how to represent data is a critical step for the performance. Generally, we
can regard each time stamp as a feature dimension for time series data
instance. However, this naive representation might be not suitable for
data analysis due to the over-fitting of data. To address this problem, we
proposed a temporal bag-of-words representation for time series classifi-
cation. A codebook is generated by the representative subsequences from
the time series data. Consequently, we encode a time series data instance
by the codebook, which describes different local patterns of time series
data. In our experiments, we demonstrate that our proposed method can
achieve better results by comparing with competitive methods.
Keywords: time series data, representation, classification, bag-of-words.
1 Introduction
Time series classification has been an important task in many machine learning
tasks, such as speech recognition or sensor data analysis. By comparing with
other types of data, time series data suffer from highly intra-class variability
where patterns might be sifted in time. As a result, naive representation (i.e.,
using a timestamp as a feature dimension) of time series might not be suitable
for data analysis [5]. Dynamic Time Warping (DTW) is proposed to address the
intra-class variations of time-series patterns [2,9]. DTW measures the similarity
between time series data with an automatic time alignment by a dynamic pro-
gramming approach. However, previous studies showed that using a high level
representation to measure similarity will be more appropriate for time series
data [7,8,6,4]. Similar to [7,6], our work is based on the Bag-of-Words (BoW)
model which aims to represent an object by feature vectors of subobjects. It is
also worth noting that BoW representations are widely used in computer vision
due to the promising performance [1,3]. In our BoW model, subsequences of all
time seres data instances are extracted for feature learning (i.e., generating a
codebook). Once we obtain the codebook, the time series data instance can be
represented by the distribution of learned codewords (i.e., the histogram of code-
words). The framework is illustrated in Fig. 1. We first generate the codebook
Search WWH ::




Custom Search