Hubness-Aware Classification, Instance Selection and Feature Construction: Survey and Extensions to Time-Series - Feature Selection for Data and Pattern Recognition - page 232

Information Technology Reference

In-Depth Information

SonyAIBO

RobotSurface

CBF

FacesUCR

300

500

800

400

600

200

300

400

200

100

200

100

02468101214

0123456

01234567

Fig. 11.4 Distribution of GN 1 ( x ) for some time series datasets. The horizontal axis corresponds

to the values of GN 1 ( x )

, while on the vertical axis one can see how many instances have that value

hubness-aware classifiers, the ones we present in the next section, are also relevant

for the classification of imbalanced data.

The total occurrence count of an instance x can be decomposed into good and bad

occurrence counts: N k (

. More generally, we can decompose

the total occurrence count into the class-conditional counts: N k (

x

) =

GN k (

x

) +

BN k (

x

)

) = C ∈ C

x

N k , C (

x

)

where N k , C (

denotes how many times x occurs as one of the k nearest neighbors

of instances belonging to class C , i.e.,

x

)

N k , C (

) =|{

x i |

∈ N k (

x i ) ∧

y i =

}|

x

x

C

(11.8)

where y i denotes the class label of x i .

As we mentioned, hubs appear in data with high (intrinsic) dimensionality,

therefore, hubness is one of the main aspects of the curse of dimensionality [ 4 ].

However, dimensionality reduction can not entirely eliminate the issue of bad hubs,

unless it induces significant information loss by reducing to a very low dimensional

space—which often ends up hurting system performance even more [ 40 ].

11.5 Hubness-Aware Classification of Time-Series

Since the issue of hubness in intrinsically high-dimensional data, such as time-series,

cannot be entirely avoided, the algorithms that work with high-dimensional data need

to be able to properly handle hubs. Therefore, in this section, we present algorithms

that work under the assumption of hubness. These mechanisms might be either

explicit or implicit.

Several hubness-aware classification methods have recently been proposed. An

instance-weighting scheme was first proposed in [ 43 ], which reduces the bad influ-

ence of hubs during voting. An extension of the fuzzy k -nearest neighbor framework

was shown to be somewhat better on average [ 54 ], introducing the concept of class-

conditional hubness of neighbor points and building an occurrence model which is

Next Page

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home