Extracting a Fuzzy System by Using Genetic Algorithms for Imbalanced Datasets Classification: Application on Down’s Syndrome Detection - Mining Complex Data

Information Technology Reference

In-Depth Information

Table 2.3. Results for the Down's syndrome problem using the FLAGID method. The first 2

columns express the accuracy of the test of 4815 patterns not included in the set of 3071

training patterns. The type of the set refers to the type of dataset: using MoM or physical. The

type of output can be symmetric or non-symmetric. Discarded RecBFs indicates if the solution

was found discarding the less representative RecBFs. The last 2 columns refer to the accuracy

of the training/test dataset, training always with the stratified half of patterns.

%TP

(4815)

%FP

(4815)

type of

dataset

type

discarded

RecBFs

#rules

%TP

(3071)

%FP

(3071)

output

60%

9.69%

physical

Symmetric

81.82%

8.39%

66.66%

10.21%

MoM

Symmetric

81.82%

7.25%

73.33%

13.56%

MoM

Symmetric

90.91%

10.49%

80%

14.48%

physical

Symmetric

100%

12.87%

beginning of this chapter, want to find a good solution dealing with the %TP and the

%TN. In case of Down's syndrome problem, rather than %TN, the %FP will be

taking into account. Finding the best solution, a threshold in one of both indexes has

to be placed.

The different rows in Table 2.3 show the best %FP for different thresholds of %TP.

The best results are in the first three rows, which minimizes the %FP. A FP, in

Down's syndrome problem, is the case that the method classifies a fetus as positive

but in reality is negative, and in this case the mother would try to do an invasive test,

which has 1% of probability of loosing the child, to be 100% sure of the results.

In all cases shown in Table 2.3, no RecBF obtained was discarded and the output

variable has a symmetric distribution of its membership function. However, the

results which improve the current methods are focused in the very small quantity of

rules found: between 4 and 6. This fact makes the system very understandable and

hence very adequate for the task of extracting intelligible fuzzy rules.

2.4.2 Comparison with Other Methods

In order to know if the FLAGID method can be applied to the classification of any

imbalanced dataset, it is needed a comparison with other methods specialized in deal-

ing with imbalanced datasets.

Table 2.4 shows this comparison with other two methods for imbalanced datasets:

KBA and SDC. These two methods are two of the best methods for imbalanced data-

sets, with very good results with datasets of the UCI repository 25. These datasets will

be used to do this comparison.

The SDC method (Smote with Different Costs) 6 combines SVM and SMOTE to

solve the problem that appears in SVM when the dataset is imbalanced: the border is

located always too near to the minor-class. This algorithm applies the modified SVM

function proposed by Veropoulos, Campbell and Cristianini 26, shown in Equation

(2). This SVM function uses different costs to the errors in the positive class and in

the negative class. The SDC method uses this function in combination with an

oversampling method called SMOTE 3.

−

∑

−

(

)

−

[

(

⋅

)

−

]

−

(2)

−

Mining Complex Data

Search WWH ::

Custom Search

Home