Hubness-Aware Classification, Instance Selection and Feature Construction: Survey and Extensions to Time-Series - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

While instance selection is well explored for general nearest neighbor

classification, see e.g. [ 1 , 6 , 19 , 24 , 32 ], there are only few works for the case

of time series. Xi et al. [ 57 ] presented the FastAWARD approach and showed that it

outperforms state-of-the-art, general-purpose instance selection techniques applied

for time-series.

FastAWARD follows an iterative procedure for discarding time series: in each

iteration, the rank of all the time series is calculated and the one with lowest rank

is discarded. Thus, each iteration corresponds to a particular number of kept time

series. Furthermore, Xi et al. argue that the optimal warping window size depends

on the number of kept time series. Therefore, FastAWARD calculates the optimal

warping window size dependent on the number of kept time series.

In this section, we present a hubness-aware instance selection technique which

was originally introduced in [ 9 ]. This approach is simpler and therefore compu-

tationally much cheaper than FastAWARD while it selects better instances, i.e.,

instances that allowmore accurate classification of time-series than the ones selected

by FastAWARD.

In [ 9 ] coverage graphs were proposed tomodel instance selection, and the instance

selection problem was formulated as finding the appropriate subset of vertices of

the coverage graph. Furthermore, it was shown that maximizing the coverage is

NP-complete in general. On the other hand, for the case of time-series classification,

a simple approach performed surprisingly well. This approach is called In stance

S elect i on based on G raph-coverage and H ubness for T ime-series or INSIGHT.

INSIGHT performs instance selection by assigning a score to each instance and

selecting instances with the highest scores (see Algorithm 4), therefore the “intel-

ligence” of INSIGHT is hidden in the applied score function. Next, we explain the

suitability of several score functions in the light of the hubness property.

•

Good 1-occurrence Score— INSIGHT can use scores that take into account how

many times an instance appears as good neighbor of other instances. Thus, a simple

score function is the good 1-occurrence score GN 1 (

)

•

Relative Score— While x is being a good hub, at the same time it may appear as

bad neighbor of several other instances. Thus, INSIGHT can also consider scores

that take bad occurrences into account. This leads to scores that relate the good

occurrence of an instance x to either its total occurrence or to its bad occurrence.

For simplicity, we focus on the following relative score , however, other variations

could be used too: relative score RS

of a time series x is the fraction of good

1-occurrences and total occurrences plus one (to avoid division by zero):

(

)

GN 1 (

)

(

) =

1 .

(11.22)

N 1 (

) +

•

Xi's score— Notably, GN k (

)

and BN k (

)

allows us to interpret the ranking

criterion used by Xi et al. in FastAWARD [ 57 ] as another form of score for relative

hubness:

(

) =

GN 1 (

) −

2 BN 1 (

(11.23)

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home