Hubness-Aware Classification, Instance Selection and Feature Construction: Survey and Extensions to Time-Series - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

11.5.4 HIKNN: Hubness Information k-Nearest Neighbor

In h-FNN, as in most k NN classifiers, all neighbors are treated as equally important.

The difference is sometimes made by introducing the dependency on the distance

to x ∗ , the instance to be classified. However, it is also possible to deduce some

sort of global neighbor relevance, based on the occurrence model—and this is what

HIKNN was based on [ 50 ]. It embodies an information-theoretic interpretation of

the neighbor occurrence events. In that context, rare occurrences have higher self-

information, see ( 11.17 ). These more informative instances are favored by the algo-

rithm. The reasons for this lie hidden in the geometry of high-dimensional feature

spaces. Namely, hubs have been shown to lie closer to the cluster centers [ 55 ], as

most high-dimensional data lies approximately on hyper-spheres. Therefore, hubs

are points that are somewhat less 'local'. Therefore, favoring the rarely occurring

points helps in consolidating the neighbor set locality. The algorithm itself is a bit

more complex, as it not only reduces the vote weights based on the occurrence fre-

quencies, but also modifies the fuzzy vote itself—so that the rarely occurring points

votemostly by their labels and the hub points votemostly by their occurrence profiles.

Next, we will present the approach in more detail.

The self-information I x i associated with the event that x i occurs as one of the

nearest neighbors of an instance to be classified can be calculated as

N k (

x i )

I x i =

log

x i ∈ N k ) ,

(

x i ∈ N k ) ≈

| .

(11.17)

train

(

| D

Occurrence self-information is used to define the relative and absolute relevance

factors in the following way:

I x i −

min x j ∈ N k ( x i )

I x j

I x i

ʱ(

x i ) =

I x j ,ʲ(

x i ) =

| .

(11.18)

log

| D

train

|−

min x j ∈ N k ( x i )

log

| D

train

The final fuzzy vote of a neighbor x i combines the information contained in its

label with the information contained in its occurrence profile. The relative relevance

factor is used for weighting the two information sources. This is shown in ( 11.19 ).

ʱ(

x i ) + (

− ʱ(

x i )) ·

u C (

x i ),

y i =

y ∗ =

P k (

x i ) ≈

(11.19)

(

− ʱ(

x i )) ·

u C (

x i ),

y i =

where y i denotes the class label of x i , for the definition of u C (

see ( 11.10 ).

The final class assignments are given by the weighted sum of these fuzzy votes.

The final vote of class C for the classification of instance x ∗ is shown in ( 11.20 ). The

distance weighting factor d w (

x i )

yields mostly minor improvements and can be left

out in practice, see [ 54 ] for more details.

x i )

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home