Information Technology Reference
In-Depth Information
Table 2.3. Results for the Down's syndrome problem using the FLAGID method. The first 2
columns express the accuracy of the test of 4815 patterns not included in the set of 3071
training patterns. The type of the set refers to the type of dataset: using MoM or physical. The
type of output can be symmetric or non-symmetric. Discarded RecBFs indicates if the solution
was found discarding the less representative RecBFs. The last 2 columns refer to the accuracy
of the training/test dataset, training always with the stratified half of patterns.
%TP
(4815)
%FP
(4815)
type of
dataset
type
of
discarded
RecBFs
#rules
%TP
(3071)
%FP
(3071)
output
60%
9.69%
physical
Symmetric
0%
4
81.82%
8.39%
66.66%
10.21%
MoM
Symmetric
0%
6
81.82%
7.25%
73.33%
13.56%
MoM
Symmetric
0%
6
90.91%
10.49%
80%
14.48%
physical
Symmetric
0%
4
100%
12.87%
beginning of this chapter, want to find a good solution dealing with the %TP and the
%TN. In case of Down's syndrome problem, rather than %TN, the %FP will be
taking into account. Finding the best solution, a threshold in one of both indexes has
to be placed.
The different rows in Table 2.3 show the best %FP for different thresholds of %TP.
The best results are in the first three rows, which minimizes the %FP. A FP, in
Down's syndrome problem, is the case that the method classifies a fetus as positive
but in reality is negative, and in this case the mother would try to do an invasive test,
which has 1% of probability of loosing the child, to be 100% sure of the results.
In all cases shown in Table 2.3, no RecBF obtained was discarded and the output
variable has a symmetric distribution of its membership function. However, the
results which improve the current methods are focused in the very small quantity of
rules found: between 4 and 6. This fact makes the system very understandable and
hence very adequate for the task of extracting intelligible fuzzy rules.
2.4.2 Comparison with Other Methods
In order to know if the FLAGID method can be applied to the classification of any
imbalanced dataset, it is needed a comparison with other methods specialized in deal-
ing with imbalanced datasets.
Table 2.4 shows this comparison with other two methods for imbalanced datasets:
KBA and SDC. These two methods are two of the best methods for imbalanced data-
sets, with very good results with datasets of the UCI repository 25. These datasets will
be used to do this comparison.
The SDC method (Smote with Different Costs) 6 combines SVM and SMOTE to
solve the problem that appears in SVM when the dataset is imbalanced: the border is
located always too near to the minor-class. This algorithm applies the modified SVM
function proposed by Veropoulos, Campbell and Cristianini 26, shown in Equation
(2). This SVM function uses different costs to the errors in the positive class and in
the negative class. The SDC method uses this function in combination with an
oversampling method called SMOTE 3.
2
w
+
n
n
n
n
+
L
(
w
,
b
,
α
)
=
+
C
ξ
+
C
ξ
α
[
y
(
w
x
+
b
)
1
+
ξ
]
β
ξ
(2)
p
i
j
i
i
i
i
i
i
2
i
/
y
=
+
1
j
/
y
=
1
i
=
1
i
=
1
ii
j
Search WWH ::




Custom Search