Information Technology Reference
In-Depth Information
the algorithm, which usually are the application of different costs (depending on the
class) to the misclassified cases.
The second direction is centred on developing new learning methods that solve the
imbalanced problem, which is impossible to manipulate with classic algorithms.
The resampling strategy has two approaches: oversampling and undersampling.
Oversampling increases the quantity of patterns included in the minor-class (e.g. 3),
and undersampling decreases the number of examples of the major-class 4. These two
techniques are the most commonly used, because they allow to leverage existing clas-
sification methods that work with balanced sets of data, such as SVM 523 , neural
networks 5 , and others 4. Two examples of SVM based methods are SDC and KBA.
SDC 6 (SMOTE with Different Costs) combines SVM and SMOTE to solve the prob-
lem of a boundary too near to the positive instances by using different learning costs.
KBA 7 (Kernel Boundary Alignment) is an algorithm based on SVM that modifies
the boundary as well.
Another strategy is to add costs to misclassified patterns, depending on their class,
to improve the classification accuracy. There are some reports like Domingos 8,
Zadrozny and Elkan 9, Meler et al. 10, etc. which present this technique as a generic
algorithm modification.
The above solutions, sometimes, do not give the wished results; it depends on the
imbalanced problem. Then, there are many attempts to solve this problem using new
methods developed in order to work with imbalanced datasets 11. For example, Visa
and Ralescu 24 proposed a fuzzy classifier for imbalanced datasets and overlapping
between classes, or Zhang et al. 12 presented a very simple and effective method,
called RLSD (Rule for Learning Skewed Data) to generate rules from highly
imbalanced datasets.
Regarding to the previous work with Down's syndrome detection using soft com-
puting methods, we refer to the work done by M. Sordo 21, who used RBF networks.
The obtained results were 84% of true positives and 35% of false positives, worse
than our results as will be seen in the Experimental Results section.
However, there is neither method nor algorithm that has been used especifically to
the Down's syndrome problem, because medical experts have openly expressed their
disbelieve in such approaches, like resampling methods, which are based on the fact
that they would either create new patterns which do not exist in real life or erase pat-
terns with small details that could be relevant for the final solution.
2.3 Proposed Method
The method proposed in this chapter has the goal of achieving an accurate Fuzzy
Classification System from an imbalanced dataset . The method consists of several
steps, which are shown in Fig. 2.1. By using the DDA/RecBF clustering algorithm, a
first set of fuzzy membership functions from the dataset can be obtained. Then, those
functions are recombined by using a special method called ReRecBF (Recombined
RecBF) which will be also presented in this chapter; and finally, with the recombined
set of membership functions and the dataset, a set of fuzzy rules will be obtained by
means of a Genetic Algorithm.
Search WWH ::




Custom Search