Information Technology Reference
In-Depth Information
11.5.4 HIKNN: Hubness Information k-Nearest Neighbor
In h-FNN, as in most k NN classifiers, all neighbors are treated as equally important.
The difference is sometimes made by introducing the dependency on the distance
to x , the instance to be classified. However, it is also possible to deduce some
sort of global neighbor relevance, based on the occurrence model—and this is what
HIKNN was based on [ 50 ]. It embodies an information-theoretic interpretation of
the neighbor occurrence events. In that context, rare occurrences have higher self-
information, see ( 11.17 ). These more informative instances are favored by the algo-
rithm. The reasons for this lie hidden in the geometry of high-dimensional feature
spaces. Namely, hubs have been shown to lie closer to the cluster centers [ 55 ], as
most high-dimensional data lies approximately on hyper-spheres. Therefore, hubs
are points that are somewhat less 'local'. Therefore, favoring the rarely occurring
points helps in consolidating the neighbor set locality. The algorithm itself is a bit
more complex, as it not only reduces the vote weights based on the occurrence fre-
quencies, but also modifies the fuzzy vote itself—so that the rarely occurring points
votemostly by their labels and the hub points votemostly by their occurrence profiles.
Next, we will present the approach in more detail.
The self-information I x i associated with the event that x i occurs as one of the
nearest neighbors of an instance to be classified can be calculated as
1
N k (
x i )
I x i =
log
x i N k ) ,
P
(
x i N k )
| .
(11.17)
train
P
(
| D
Occurrence self-information is used to define the relative and absolute relevance
factors in the following way:
I x i
min x j N k ( x i )
I x j
I x i
ʱ(
x i ) =
I x j ,ʲ(
x i ) =
| .
(11.18)
log
| D
train
|−
min x j N k ( x i )
log
| D
train
The final fuzzy vote of a neighbor x i combines the information contained in its
label with the information contained in its occurrence profile. The relative relevance
factor is used for weighting the two information sources. This is shown in ( 11.19 ).
ʱ(
x i ) + (
1
ʱ(
x i )) ·
u C (
x i ),
y i =
C
y =
P k (
C
|
x i )
(11.19)
(
1
ʱ(
x i )) ·
u C (
x i ),
y i =
C
where y i denotes the class label of x i , for the definition of u C (
see ( 11.10 ).
The final class assignments are given by the weighted sum of these fuzzy votes.
The final vote of class C for the classification of instance x is shown in ( 11.20 ). The
distance weighting factor d w (
x i )
yields mostly minor improvements and can be left
out in practice, see [ 54 ] for more details.
x i )
 
Search WWH ::




Custom Search