A Hybrid Approach of Boosting Against Noisy Data - Mining Complex Data

Information Technology Reference

In-Depth Information

to improve by voting techniques the performance of a single classifier. These

aggregation methods are good for compromised Skew-variance, thanks to the

three fundamental reasons explained in [6]. These methods of aggregation are

divided into two categories. The first category refers to those which merge preset

classifiers, such as simple voting [2], the weighted voting [2], and the weighted

majority voting [12]. The second category consists of those which merge classi-

fiers according to data during the training, such as adaptive strategies (Boosting)

and the basic algorithm AdaBoost [21] or random strategies (Bagging) [3].

We are interested in the method of Boosting, because of the comparative

study [7] that shows, in little noise, AdaBoost is seemed to be working against

the overfitting. In fact, AdaBoost tries to optimize directly the weighted votes.

This observation has been proved not only by the fact that the empirical error

on the training set decreases exponentially with iterations, but also by the fact

that the error in generalization also decreases, even when the empirical error

reached its minimum. However, this method is blamed because of overfitting,

and the speed of convergence especially with noise. In the last decade, many

studies focused on the weaknesses of AdaBoost and proposed its improvement.

The important improvements were carried on the modification of the weight of

examples [19], [18], [1], [20], [14], [8], the modification of the margin [9], [20],

[17], the modification of the classifiers' weight [15], the choice of weak learning

[5], [24] and the speed of convergence [22], [13], [18]. In this paper, we propose

a new improvement to the basic Boosting algorithm AdaBoost. This approach

aims exploiting assumptions generated with the former iterations of AdaBoost

to act both on the modification of the weight of examples and the modification

of the classifiers' weight. By exploiting these former assumptions, we think that

we will avoid the re-generation of a same classifier within different iterations

of AdaBoost. Thus, consequently, we expect a positive effect on the improve-

ment of the speed of convergence. The paper is organized in three sections. In

the following section, we describe the studies whose purpose is to improve the

Boosting against its weaknesses. In the third section, we describe our improve-

ment of boosting by exploiting former assumptions. In the fourth section, we

present an experimental study of the proposed improvement by comparing its

error in generalization, its recall and its speed of convergence with AdaBoost, on

many real databases. We study also the behavior of the proposed improvement

on noisy data. We present also comparative experiments of our proposed method

with BrownBoost (a new method known that it improves AdaBoost with noisy

data). Lastly, we give our conclusions and perspectives.

3.2

State of Art

Due to the finding of some weaknesses, such as the overfitting and the speed

of convergence, met by the basic algorithm of boosting AdaBoost, several re-

searchers have tried to improve it.

Therefore, we make a study of main methods having as purpose to improve

boosting relatively to these weaknesses. With this intention, the researchers try

Mining Complex Data

Search WWH ::

Custom Search

Home