Information Technology Reference
In-Depth Information
Fig. 11.5 The hubness-aware analytic framework: learning from past neighbor occurrences
used in classification. This approach was further improved by considering the self-
information of individual neighbor occurrences [ 50 ]. If the neighbor occurrences are
treated as random events, the Bayesian approaches also become possible [ 52 , 53 ].
Generally speaking, in order to predict how hubs will affect classification of non-
labeled instances (e.g. instances arising from observations in the future), we can
model the influence of hubs by considering the training data. The training data can
be utilized to learn a neighbor occurrence model that can be used to estimate the
probability of individual neighbor occurrences for each class. This is summarized in
Fig. 11.5 . There are many ways to exploit the information contained in the occurrence
models. Next, we will review the most prominent approaches.
While describing these approaches, we will consider the case of classifying an
instance x , and we will denote its nearest neighbors as x i , i
.We
assume that the test data is not available when building the model, and therefore
N k (
∈{
1
,...,
k
}
x
)
, N k , C (
x
)
, GN k (
x
)
, BN k (
x
)
are calculated on the training data.
11.5.1 hw-kNN: Hubness-Aware Weighting
The weighting algorithm proposed by Radovanovi´cetal.[ 41 ] is one of the simplest
ways to reduce the influence of bad hubs. They assign lower voting weights to bad
hubs in the nearest neighbor classifier. In hw- k NN, the vote of each neighbor x i is
weighted by e h b ( x i ) , where
BN k (
x i ) μ BN k ( x )
˃ BN k ( x )
h b (
x i ) =
(11.9)
x )
is the standardized bad hubness score of the neighbor instance x i N k (
,
μ BN k ( x )
and
˃ BN k ( x )
are the mean and standard deviation of the distribution of BN k (
x
)
.
Example 1 We illustrate the calculation of N k (
and the
hw- k NN approach on the example shown in Fig. 11.6 . As described previously, hub-
ness primarily characterizes high-dimensional data. However, in order to keep it
simple, this illustrative example is taken from the domain of low dimensional vector
classification. In particular, the instances are two-dimensional, therefore, they can
be mapped to points of the plane as shown in Fig. 11.6 . Circles (instances 1-6) and
x
)
,
GN k (
x
)
,
BN k (
x
)
 
Search WWH ::




Custom Search