Extracting a Fuzzy System by Using Genetic Algorithms for Imbalanced Datasets Classification: Application on Down’s Syndrome Detection - Mining Complex Data

Information Technology Reference

In-Depth Information

the algorithm, which usually are the application of different costs (depending on the

class) to the misclassified cases.

The second direction is centred on developing new learning methods that solve the

imbalanced problem, which is impossible to manipulate with classic algorithms.

The resampling strategy has two approaches: oversampling and undersampling.

Oversampling increases the quantity of patterns included in the minor-class (e.g. 3),

and undersampling decreases the number of examples of the major-class 4. These two

techniques are the most commonly used, because they allow to leverage existing clas-

sification methods that work with balanced sets of data, such as SVM 523 , neural

networks 5 , and others 4. Two examples of SVM based methods are SDC and KBA.

SDC 6 (SMOTE with Different Costs) combines SVM and SMOTE to solve the prob-

lem of a boundary too near to the positive instances by using different learning costs.

KBA 7 (Kernel Boundary Alignment) is an algorithm based on SVM that modifies

the boundary as well.

Another strategy is to add costs to misclassified patterns, depending on their class,

to improve the classification accuracy. There are some reports like Domingos 8,

Zadrozny and Elkan 9, Meler et al. 10, etc. which present this technique as a generic

algorithm modification.

The above solutions, sometimes, do not give the wished results; it depends on the

imbalanced problem. Then, there are many attempts to solve this problem using new

methods developed in order to work with imbalanced datasets 11. For example, Visa

and Ralescu 24 proposed a fuzzy classifier for imbalanced datasets and overlapping

between classes, or Zhang et al. 12 presented a very simple and effective method,

called RLSD (Rule for Learning Skewed Data) to generate rules from highly

imbalanced datasets.

Regarding to the previous work with Down's syndrome detection using soft com-

puting methods, we refer to the work done by M. Sordo 21, who used RBF networks.

The obtained results were 84% of true positives and 35% of false positives, worse

than our results as will be seen in the Experimental Results section.

However, there is neither method nor algorithm that has been used especifically to

the Down's syndrome problem, because medical experts have openly expressed their

disbelieve in such approaches, like resampling methods, which are based on the fact

that they would either create new patterns which do not exist in real life or erase pat-

terns with small details that could be relevant for the final solution.

2.3 Proposed Method

The method proposed in this chapter has the goal of achieving an accurate Fuzzy

Classification System from an imbalanced dataset . The method consists of several

steps, which are shown in Fig. 2.1. By using the DDA/RecBF clustering algorithm, a

first set of fuzzy membership functions from the dataset can be obtained. Then, those

functions are recombined by using a special method called ReRecBF (Recombined

RecBF) which will be also presented in this chapter; and finally, with the recombined

set of membership functions and the dataset, a set of fuzzy rules will be obtained by

means of a Genetic Algorithm.

Mining Complex Data

Search WWH ::

Custom Search

Home