A Hybrid Approach of Boosting Against Noisy Data - Mining Complex Data

Information Technology Reference

In-Depth Information

to use the strong points of Boosting such as the update of the badly classified

examples, the maximization of the margin, the significance of the weights that

AdaBoost associates the hypothesis and finally the choice of weak learning.

3.2.1

Modification of the Examples' Weight

The distributional adaptive update of the examples, aiming at increasing the

weight of those badly learned by the preceding classifier, makes it possible to

improve the performance of any training algorithm . Indeed, with each iteration,

the current distribution supports the examples having been badly classified by

the preceding hypothesis, which characterizes the adaptivity of AdaBoost. As a

result, several researchers proposed strategies related to a modification of weight

update of the examples, to avoid the overfitting.

Indeed, we can quote for example MadaBoost [8] whose aim is to limit the

weight of each example by its initial probability. It acts thus on the uncon-

trolled growth of the weight of certain examples (noise) which is the problem of

overfitting.

Another approach which make the algorithm of boosting resistant to the noise

is Brownboost [14], an algorithm based on Boost-by-Majority by incorporating

a time parameter. Thus for an appropriate value of this parameter, BrownBoost

is able to avoid the overfitting. Another approach, which adapts to AdaBoost a

logistic regression model, is Logitboost [18].

An approach, which produces less errors of generalization compared with the

traditional approach but with the cost of an error of training slightly more

raised , is the Modest boost [1]. In fact, its update is based on the reduction in

the contribution of classifier, if that functions “too well” on the data correctly

classified. This is why the method is called Modest AdaBoost - it forces the

classifiers to be “modest” and it works only in the field defined by a distribution.

An approach, which tries to reduce the effect of overfitting by imposing

limitations on the distribution produced during the process of boosting is used

in SmoothBoost [20]. In particular, a limited weight is assigned to each exam-

ple individually during each iteration. Thus, the noisy data can be excessively

underlined during the iterations since they are assigned to the extremely large

weights.

A last approach, Iadaboost [19], is based on the idea of building around each

example a local information measurement, making it possible to evaluate the

overfitting risks, by using neighboring graph to measure information around each

example. Thanks to these measurements, we have a function which translates

the need for updating the example. This function makes it possible to manage

the outliers and the centers of clusters at the same time.

3.2.2

Modification of the Margin

Certain studies, analyzing the behavior of Boosting, showed that the error in

generalization still decreases even when the errors in training are stable. The

Mining Complex Data

Search WWH ::

Custom Search

Home