Extracting a Fuzzy System by Using Genetic Algorithms for Imbalanced Datasets Classification: Application on Down’s Syndrome Detection - Mining Complex Data

Information Technology Reference

In-Depth Information

variable. The membership functions created from examples belonging to the minor-

class are transformed into triangles (having its maximum in the average point of its

original trapezoid) and the major-class ones as trapezoids.

We can obtain the following conclusions about the proposed transformation:

1. More uncertainty is introduced in the definition of the membership function be-

cause we do not discard that a membership function can be evaluated in the whole

variable space.

2. Every value for a membership function is always greater than 0, so always the

alpha-cut will be greater than 0.

3. As the value of a rule will be determined by the minimum of the alpha-cuts, and as

the importance of a class is determined by the maximum of the alpha-cuts of all its

rules, we can affirm that the final importance of a class in a pattern can be deter-

mined by only one rule, and that rule has to have all membership function as near

as possible the values of the example treated.

4. The minor-class membership functions introduce more uncertainty because they

are treated as triangles. If we do not have much information about how that class is

(the case for highly imbalanced datasets), we can consider an area centred in the

average of the membership function's core-region and its belonging value decreas-

ing as far away it is.

Thanks to these characteristics, the membership functions are not so rigid and they

have many more possibilities to participate in every rule. Thus, we give to the system

more feasible tools to adapt itself to the given dataset.

The following subsection explains how the rule set will be extracted by a genetic

algorithm, once the membership functions have been calculated by the previously

specified procedure.

2.3.3 Obtaining the Rule Set: The Genetic Algorithm

The codification of one chromosome of our genetic algorithm is expressed in the

following line: (x 1,1 ,...,x 1,n ,x 2,1 ,....,x 2,n ,x m,1 ,...,x m,n ) where n is the number of variables

(input variables plus output variables) and m is the number of rules. x i,j is the value

that a gene can take, which is an integer value compressed in the interval

[0,n_fuzzysets j ] and n_fuzzysets j is the number of membership functions of the j th

variable. If a x i,j has value 0 it expresses that this variable is not present in the rule. If

the 0 value is assigned to the output fuzzy set, the rule is not taken into account to

evaluate the rules. So, the system is able to find a set of rules less than m , just putting

0 in the output fuzzy set of the rule. Every x i,1 ,..., x i,n corresponds to a rule of the

system.

The initial population is either taken randomly or by an initial set of rules. Every

gene of a chromosome is generated randomly in the interval [0,n_fuzzysets j ], but

some rules can be fixed for the entire simulation or just given as an initial set of

rules, if needed. If a fuzzy set is 0, it means that the variable is not taken into account

in the rule.

In the case of this chapter, the initial rules have been generated randomly and no

restriction has applied on them.

Mining Complex Data

Search WWH ::

Custom Search

Home