A Hybrid Approach of Boosting Against Noisy Data - Mining Complex Data

Information Technology Reference

In-Depth Information

explanation is that even if all the examples of training are already well classified,

Boosting tends to maximize the margins [20].

Following this, some studies try to modify the margin either by maximizing it

or by minimizing it with the objective of improving the performance of Boosting

against overfitting.

Several approaches followed such as AdaBoostReg [17] which tries to identify

and remove badly labeled examples, or to apply the constraint of the maximum

margin to examples supposed to be badly labeled, by using the Soft Margin.

In the algorithm, proposed by [9], the authors use a weighting diagram which

exploits a margin function that grows less quickly than the exponential function.

3.2.3

Modification of the Classifiers' Weight

During the performance evaluation of Boosting, researchers wondered about the

significance of the weights α ( t ) that AdaBoost associates with the produced

hypotheses.

However, they noted at the time of experiments on very simple data that the

error in generalization decreased further whereas the weak learning had already

provided all the possible hypotheses. In other words, when a hypothesis appears

several times, it votes finally with a weight, oce sum of all α ( t ), which is

perhaps absolute. So several researchers hoped to approach these values by a

nonadaptive process , such as locboost [15] an alternative to the construction of

the whole representations of experts which allows the coecients α ( t ) to depend

on the data.

3.2.4

Choice of Weak Learner

A question that several researchers posed against the problems of boosting is

that of weak learner and how to make a good choice of this classifier?

A lot of research moves towards the study of choosing the basic classifier

of boosting, such as GloBoost [24]. This approach use a weak learner which

produces only correct hypotheses. RankBoost [5] is also an approach which is

based on weak learner which accepts as data attributes functions of rank.

3.2.5

The Speed of Convergence

In addition to the problem of overfitting met by boosting in the modern

databases mentioned above, we find another problem : the speed of convergence

of Boosting especially AdaBoost.

Indeed, in the presence of noisy data, the optimal error of the training al-

gorithm used is reached after a long time. In other words, AdaBoost “loses”

iterations, and thus time, with reweighing examples which do not deserve in

theory any attention, since it is a noise.

Thus research was made to detect these examples and improve the perfor-

mance of Boosting in terms of convergence such as: iBoost [22] which aims at

specializing weak hypotheses on the examples supposed to be correctly classified.

Mining Complex Data

Search WWH ::

Custom Search

Home