Extracting a Fuzzy System by Using Genetic Algorithms for Imbalanced Datasets Classification: Application on Down’s Syndrome Detection - Mining Complex Data

Information Technology Reference

In-Depth Information

Therefore, the following aspects will be taken into account to train the DDA/RecBF

algorithm, argued in 17:

1. The shrinking operation can be either done in all dimensions or only in one (which

looses less hyper-area in its shrinking). In our case, the latter is used, as proposed

by the authors of the DDA/RecBF algorithm in 13, because it causes less granula-

tion on RecBFs.

2. The algorithm will train the dataset sorted by classes, first major-class and then

minor-class. This fact will reduce the quantity of RecBFs (membership functions)

of the minor-class, due to the DDA organization and RecBFs creation. So, the

minor-class will have the minimum quantity of membership functions, letting the

major-class organize itself in the quantity of membership functions needed.

The following subsection explains how to obtain new RecBFs by recombining the

existing ones, in order to adapt them to work with imbalanced datasets.

2.3.2 ReRecBF Algorithm: Recombining Rectangular Basis Functions (RecBFs)

Since the shrinking method in RecBF algorithm is performed only in one dimension,

superposed membership functions are given as result of that algorithm (Fig. 2.4(4)),

when working with datasets with overlapped classes (boundary not well defined).

These membership functions are not adequate to obtain an accurate set of fuzzy rules

from the imbalanced dataset. The tests on different datasets demonstrated that the

membership functions obtained from the DDA/RecBF algorithm were not discrimi-

nant enough and some transformations of these membership functions were needed.

On these grounds we propose to (argued in 17):

1. Take only the intervals obtained by the core-regions.

2. Transform to triangles the trapezoids belonging to the minor-class.

3. If it is possible, discard the less representative RecBFs. That is, RecBFs whose

core-region includes less than the 10% of the patterns of its class.

The core-regions delimit the areas where the training patterns are, and the support-

regions (without taking into account the core ones) are the undefined areas between

the core ones. In these undefined regions the algorithm does not know how to classify

the possible patterns included there. Then, we can affirm that the only place where we

are sure that a pattern belongs to a class is into the core-regions of that class.

Out of them, the RecBF represents a grading of the level in which the instance

belongs to the class. In this way, the nearer the value is to the core area, the higher the

possibilities to belong to the class. So, in this case, the set of overlapped membership

functions belonging to a variable and a class will be split into new ones, selecting

only the areas defined by the core-regions. Thanks to this operation, we will have the

membership function areas divided into sectors, and this fact will improve the

quantity of patterns matched by the rules found.

Finally, the recombination procedure shown in Fig. 2.4 consists in creating new

membership function by splitting the existing ones by its core-region (points b and c)

and eliminating the old ones. For every new membership function created, the

support-area will be defined from the minimum to the maximum values of that

Search WWH ::

Custom Search

Home