Information Technology Reference
In-Depth Information
SonyAIBO
RobotSurface
CBF
FacesUCR
300
500
800
400
600
200
300
400
200
100
200
100
02468101214
0123456
01234567
Fig. 11.4
Distribution of
GN
1
(
x
)
for some time series datasets. The
horizontal axis
corresponds
to the values of
GN
1
(
x
)
, while on the
vertical axis
one can see how many instances have that value
hubness-aware classifiers, the ones we present in the next section, are also relevant
for the classification of imbalanced data.
The total occurrence count of an instance
x
can be decomposed into good and bad
occurrence counts:
N
k
(
. More generally, we can decompose
the total occurrence count into the class-conditional counts:
N
k
(
x
)
=
GN
k
(
x
)
+
BN
k
(
x
)
)
=
C
∈
C
x
N
k
,
C
(
x
)
where
N
k
,
C
(
denotes how many times
x
occurs as one of the
k
nearest neighbors
of instances belonging to class
C
, i.e.,
x
)
N
k
,
C
(
)
=|{
x
i
|
∈
N
k
(
x
i
)
∧
y
i
=
}|
x
x
C
(11.8)
where
y
i
denotes the class label of
x
i
.
As we mentioned, hubs appear in data with high (intrinsic) dimensionality,
therefore, hubness is one of the main aspects of the curse of dimensionality [
4
].
However, dimensionality reduction can not entirely eliminate the issue of bad hubs,
unless it induces significant information loss by reducing to a very low dimensional
space—which often ends up hurting system performance even more [
40
].
11.5 Hubness-Aware Classification of Time-Series
Since the issue of hubness in intrinsically high-dimensional data, such as time-series,
cannot be entirely avoided, the algorithms that work with high-dimensional data need
to be able to properly handle hubs. Therefore, in this section, we present algorithms
that work under the assumption of hubness. These mechanisms might be either
explicit or implicit.
Several hubness-aware classification methods have recently been proposed. An
instance-weighting scheme was first proposed in [
43
], which reduces the bad influ-
ence of hubs during voting. An extension of the fuzzy
k
-nearest neighbor framework
was shown to be somewhat better on average [
54
], introducing the concept of
class-
conditional hubness
of neighbor points and building an occurrence model which is
Search WWH ::
Custom Search