Information Technology Reference
In-Depth Information
5.5.2 One-Class Learning
Raskutti and Kowalczyk [26] and Kowalczyk and Raskutti [27] have presented
two extreme rebalancing methods for training SVMs with highly imbalanced
datasets. In the first method, they have trained an SVM model only with the
minority class examples. In the second method, the DEC method has been
extended to assign a C =
0 misclassification cost for the majority class examples
and C + =
1 /N + misclassification cost for minority class examples, where N + is
the number of minority class examples. From the experimental results obtained
on several heavily imbalanced synthetic and real-world datasets, these methods
have been observed to be more effective than general data rebalancing methods.
5.5.3
zSVM
zSVM is another algorithmic modification proposed for SVMs in [28] to learn
from imbalanced datasets. In this method, first an SVM model is developed by
using the original imbalanced training dataset. Then, the decision boundary of
the resulted model is modified to remove its bias toward the majority (negative)
class. Consider the standard SVM decision function given in Equation 5.9, which
can be rewritten as follows:
sign l
α i y i K(x i ,x) + b
f(x) =
i = 1
l 1
l 2
α i y i K(x i ,x)
α j y j K(x j ,x)
=
+
+
b
(5.13)
sign
i
=
1
j
=
1
where α i are the coefficients of the positive support vectors, α j are the coef-
ficients of the negative support vectors, and l 1 and l 2 represent the number of
positive and negative training examples, respectively. In the zSVM method, the
magnitude of the α i values of the positive support vectors is increased by mul-
tiplying all of them by a particular small positive value z . Then, the modified
SVM decision function can be represented as follows:
z
l 1
l 2
α i y i K(x i ,x)
α i y i K(x j ,x)
f(x)
=
+
+
b
(5.14)
sign
i
=
1
j
=
1
This modification of α i would increase the weights of the positive support
vectors in the decision function, and therefore it would decrease its bias toward
the majority negative class. In [28], the value of z giving the best classification
results for the training dataset was selected as the optimal value.
Search WWH ::




Custom Search