Information Technology Reference
In-Depth Information
Fig. 11.7 The skewness of the neighbor occurrence frequency distribution for neighborhood sizes
k = 1and k = 10. In both figures, each column corresponds to a dataset of the UCR repository.
The figures show the change in the skewness, when k is increased from 1 to 10
C C (
2
P
(
C
)
1
/
m
)
RImb
=
.
(11.21)
(
m
1
)/
m
In general, an occurrence frequency distribution skewness above 1 indicates a
significant impact of hubness. Many UCR datasets have
1, which means
that the first nearest neighbor occurrence distribution is significantly skewed to the
right. However, an increase in neighborhood size reduces the overall skewness of
the datasets, as shown in Fig. 11.7 . Note that only a few datasets have
S N 1 ( x ) >
1,
though some non-negligible skewness remains in most of the data. Yet, even though
the overall skewness is reduced with increasing neighborhood sizes, the degree of
major hubs in the data increases. This leads to the emergence of strong centers of
influence.
We evaluated the performance of different k NN classification methods on time
series data for a fixed neighborhood size of k
S N 10 ( x ) >
10. A slightly larger k value
was chosen, since most hubness-aware methods are known to perform better in
such cases, as better and more reliable neighbor occurrence models can be inferred
from more occurrence information. We also analyzed the algorithm performance
over a range of different neighborhood sizes, as shown in Fig. 11.8 . The hubness-
aware classification methods presented in the previous sections (hw- k NN, NHBNN,
h-FNN and HIKNN) were compared to the baseline k NN [ 15 ] and the adaptive k NN
(AKNN) [ 56 ], where the neighborhood size is recalculated for each query point based
on initial observations, in order to consult only the relevant neighbor points. AKNN
does not take the hubness of the data into account.
The tests were run according to the 10-times 10-fold cross-validation protocol and
the statistical significance was determined by employing the corrected re-sampled
t -test. The detailed results are given in Table 11.4 .
The adaptive neighborhood approach (AKNN) does not seem to be appropriate
for handling time-series data, as it performs worse than the baseline k NN. While
hw- k NN, NHBNN and h-FNN are better than the baseline k NN in some cases, they
do not offer significant advantage overall which is probably a consequence of a
relatively low neighbor occurrence skewness for k
=
=
10 (see Fig. 11.7 ). The hubness
 
Search WWH ::




Custom Search