Information Technology Reference
In-Depth Information
SonyAIBO
RobotSurface
CBF
FacesUCR
300
500
800
400
600
200
300
400
200
100
200
100
02468101214
0123456
01234567
Fig. 11.4 Distribution of GN 1 ( x ) for some time series datasets. The horizontal axis corresponds
to the values of GN 1 ( x )
, while on the vertical axis one can see how many instances have that value
hubness-aware classifiers, the ones we present in the next section, are also relevant
for the classification of imbalanced data.
The total occurrence count of an instance x can be decomposed into good and bad
occurrence counts: N k (
. More generally, we can decompose
the total occurrence count into the class-conditional counts: N k (
x
) =
GN k (
x
) +
BN k (
x
)
) = C C
x
N k , C (
x
)
where N k , C (
denotes how many times x occurs as one of the k nearest neighbors
of instances belonging to class C , i.e.,
x
)
N k , C (
) =|{
x i |
N k (
x i )
y i =
}|
x
x
C
(11.8)
where y i denotes the class label of x i .
As we mentioned, hubs appear in data with high (intrinsic) dimensionality,
therefore, hubness is one of the main aspects of the curse of dimensionality [ 4 ].
However, dimensionality reduction can not entirely eliminate the issue of bad hubs,
unless it induces significant information loss by reducing to a very low dimensional
space—which often ends up hurting system performance even more [ 40 ].
11.5 Hubness-Aware Classification of Time-Series
Since the issue of hubness in intrinsically high-dimensional data, such as time-series,
cannot be entirely avoided, the algorithms that work with high-dimensional data need
to be able to properly handle hubs. Therefore, in this section, we present algorithms
that work under the assumption of hubness. These mechanisms might be either
explicit or implicit.
Several hubness-aware classification methods have recently been proposed. An
instance-weighting scheme was first proposed in [ 43 ], which reduces the bad influ-
ence of hubs during voting. An extension of the fuzzy k -nearest neighbor framework
was shown to be somewhat better on average [ 54 ], introducing the concept of class-
conditional hubness of neighbor points and building an occurrence model which is
 
 
Search WWH ::




Custom Search