Information Technology Reference
In-Depth Information
age, mother age, the measure of some hormones, etc. The risk is usually indicated in
odds as an expression of its certainty probability.
There are several screening methods which are applied to achieve this goal, taking
into account that those methods are different depending on whether we are evaluating
the first or the second-trimester of pregnancy. Nowadays, the screening methods for
the first-trimester of pregnancy have an accuracy of the 80-90% and for the second-
trimester just the 60-70%, with a 5% of false positives for the latter, although usually
they are around and 8% 18.
The data obtained in prenatal Down's syndrome detection problem is a two-class
imbalanced dataset. An imbalanced dataset is characterized by the existence of a high
difference in the number of cases in one class with respect to the rest of the classes. In
this case, as being a two-class dataset (it has or has not Down's syndrome), the fetus
with Down's syndrome (positive class) has a much lesser number of cases than the
healthy one (negative class). Regarding to the number of cases, from now on we will
refer to the negative class as the major-class and the positive class as the minor-class .
The focus of our study is to improve the above results obtained by the screening
methods in the second-trimester of pregnancy and, as far as possible, extract an
understandable set of rules. This goal will be achieved by a new Soft Computing
method based on Fuzzy Logic designed to work with imbalanced datasets.
The Soft Computing method is called FLAGID (Fuzzy Logic And Genetic algo-
rithms for Imbalanced Datasets). The FLAGID method consists in using a clustering
algorithm called DDA/RecBF to obtain a first set of trapezoidal Membership Func-
tions from the dataset, recombine those functions to obtain new ones, and finally, with
the recombined set of membership functions and the dataset, obtain a set of fuzzy
rules by means of a Genetic Algorithm. The result is expressed as a Fuzzy System.
This chapter is structured in 6 sections. The first section is this introduction, which
describes the topic which is dealt with in this chapter. In the second section we review
some related work aiming at solving the imbalanced datasets problem. The third
section details the development and the characteristics of our new method (FLAGID).
In the following section, we present the experimental results obtained by applying the
FLAGID method to the dataset corresponding to the Down's syndrome, and we also
compare FLAGID accuracy with other imbalanced methods. Finally, future applica-
tions of the method and its conclusions are presented.
2.2 Related Work
The imbalanced datasets classification problem has recently received considerable
attention from the machine learning community 12. Several studies have been pub-
lished dealing with this classification problem, which can be divided into two impor-
tant directions. The first one corresponds to the use of the traditional learning methods
with some changes, in the dataset and/or the algorithm. The changes applied to the
dataset, commonly deal with resampling the quantity of data of every class in order to
equilibrate their number of cases. Those of the algorithm try to avoid the undesirable
effects produced by an imbalanced dataset, by introducing some precise changes to
Search WWH ::




Custom Search