Information Technology Reference
In-Depth Information
novel in time-series classification, therefore, we will experimentally evaluate it and
compare to state-of-the-art time-series classifiers.
The remainder of this chapter is organized as follows: in Sect. 11.2 we
formally define the time-series classification problem, summarize the basic nota-
tion used throughout this chapter and shortly describe nearest neighbor classification.
Section 11.3 is devoted to dynamic time warping, and Sect. 11.4 presents the hubness
phenomenon. In Sect. 11.5 we describe state-of-the-art hubness-aware classifiers, fol-
lowed by hubness-aware instance selection and feature construction approaches in
Sect. 11.6 . Finally, we conclude in Sect. 11.7 .
11.2 Problem Formulation and Basic Notations
The problem of classification can be stated as follows. We are given a set of instances
and some groups. The groups are called classes, and they are denoted as C 1 ,...,
C m .
Each instance x belongs to one of the classes. 1 Whenever x belongs to class C i ,we
say that the class label of x is C i . We denote the set of all the classes by
C
, i.e.,
C ={
C 1 ,...,
C m }
.Let
D
be a dataset of instances x i and their class labels y i , i.e.,
train , called training data .The
D ={ (
x 1 ,
y 1 )...(
x n ,
y n ) }
. We are given a dataset
D
task of classification is to induce a function f
(
x
)
, called classifier , which is able to
train .
In real-world applications, for some instances we know (from measurements
and/or historical data) to which classes they belong, while the class labels of other
instances are unknown. Based on the data with known classes, we induce a classifier,
and use it to determine the class labels of the rest of the instances.
In experimental settings we usually aim at measuring the performance of a classi-
fier. Therefore, after inducing the classifier using
assign class labels to instances not contained in
D
train , we use a second dataset
test ,
D
D
test , we compare the output of the classifier,
i.e., the predicted class labels, with the true class labels, and calculate the accuracy of
classification. Therefore, the task of classification can be defined formally as follows:
given two datasets
called test data : for the instances of
D
train and
test , the task of classification is to induce a classifier
D
D
test . For the induction of f
f
(
x
)
that maximizes prediction accuracy for
D
(
x
)
, however,
test .
Next, we describe the k -nearest neighbor classifier ( k NN). Suppose, we are given
an instance x D
train can be used, but not
solely
D
D
test that should be classified. The k NN classifier searches for those
k instances of the training dataset that are most similar to x . These k most similar
instances are called the k nearest neighbors of x .The k NN classifier considers the k
nearest neighbors, and takes the majority vote of their labels and assigns this label to
x :e.g.if k
3 and two of the nearest neighbors of x belong to class C 1 , while one
=
1 In this chapter, we only consider the case when each instance belongs to exactly one class.
Note, however, that the presence of hubs may be relevant in the context of multilabel and fuzzy
classification as well.
 
Search WWH ::




Custom Search