Time Series Classification with Temporal Bag-of-Words Model - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

Time Series Classification with Temporal

Bag-of-Words Model

Zi-Wen Gui 1 and Yi-Ren Yeh 2

1 Department of Computer Science and Information Engineering

National Taiwan University of Science and Technology, Taipei, Taiwan

2 Department of Applied Mathematics

Chinese Culture University, Taipei, Taiwan

Abstract. Time series classification has attracted increasing attention

in machine learning and data mining. In the analysis of time series data,

how to represent data is a critical step for the performance. Generally, we

can regard each time stamp as a feature dimension for time series data

instance. However, this naive representation might be not suitable for

data analysis due to the over-fitting of data. To address this problem, we

proposed a temporal bag-of-words representation for time series classifi-

cation. A codebook is generated by the representative subsequences from

the time series data. Consequently, we encode a time series data instance

by the codebook, which describes different local patterns of time series

data. In our experiments, we demonstrate that our proposed method can

achieve better results by comparing with competitive methods.

Keywords: time series data, representation, classification, bag-of-words.

1 Introduction

Time series classification has been an important task in many machine learning

tasks, such as speech recognition or sensor data analysis. By comparing with

other types of data, time series data suffer from highly intra-class variability

where patterns might be sifted in time. As a result, naive representation (i.e.,

using a timestamp as a feature dimension) of time series might not be suitable

for data analysis [5]. Dynamic Time Warping (DTW) is proposed to address the

intra-class variations of time-series patterns [2,9]. DTW measures the similarity

between time series data with an automatic time alignment by a dynamic pro-

gramming approach. However, previous studies showed that using a high level

representation to measure similarity will be more appropriate for time series

data [7,8,6,4]. Similar to [7,6], our work is based on the Bag-of-Words (BoW)

model which aims to represent an object by feature vectors of subobjects. It is

also worth noting that BoW representations are widely used in computer vision

due to the promising performance [1,3]. In our BoW model, subsequences of all

time seres data instances are extracted for feature learning (i.e., generating a

codebook). Once we obtain the codebook, the time series data instance can be

represented by the distribution of learned codewords (i.e., the histogram of code-

words). The framework is illustrated in Fig. 1. We first generate the codebook

Search WWH ::

Custom Search

Home