A Hybrid Approach of Boosting Against Noisy Data - Mining Complex Data

Information Technology Reference

In-Depth Information

The modification within the algorithm is made through two ways

The first way is during the modification of the weights of the examples: Indeed,

this strategy, with each iteration, is based on the opinion of the experts already

used (hypotheses of the former iterations) for the update of the weight of the

examples.

In fact, we do not compare only the class predicted by the hypothesis of

the current iteration with the real class but also the sum of the hypotheses

balanced from the first iteration to the current iteration. If this sum votes for

a class different from the real class, an exponential update such as in the case

of AdaBoost is applied to the badly classified example. Thus, this modification

lets the algorithm be interested only in the examples which are either badly

classified or not classified yet. So, results related to the improvement the speed of

convergence are awaited, similarly for the reduction of the error of generalization,

because of the richness of the space of hypotheses to each iteration.

The second way is during the error analysis ( t ) of the hypothesis to the

iteration T: Indeed, this other strategy is rather interested in the classifiers'

coecient ( hypothesis) to each iteration α ( t ).

In fact, this coecient depends on the apparent error analysis ( t ). This

method, with each iteration, takes into account hypotheses preceding the current

iteration during the calculation of ( t ). So the apparent error with each iteration

is the weight of the examples voted badly classified by the hypotheses weighted

of the former iterations by comparison to the real class.

Results in improving the error of generalization are expected since the vote

of each hypothesis (coecient α ( t )) is calculated from the other hypotheses.

3.4

Experiments

The objective of this part is to compare our new approach and especially

its contribution with the original approach of Adaboost and to look further

into this comparison by the choice of a version improved of Adaboost (Brown

Boost [14]).

Our Choice of BrownBoost was based on its robustness against the problems

of noisy data. In fact,BrownBoost is an adaptive algorithm which use a function

that depends on the iteration number K (execution time), the Current iteration

i, the number of times that the example has already been correctly predicted r,

and the probability of success 1

−

γ

α r =( k − i − 1

r )(1 / 2+ γ ) ( k/ 2) − r (1 / 2

γ ) ( k/ 2) − i − 1+ r

−

insteadontheexponential

k/ 2

−

function.

So by a good estimation of K parameter BrownBoost is capable of avoiding

overfitting. The advantage of this approach is that the noised data would be

detected at some point, and their weights stop rising.

The comparison criterions chosen in this article are the error rate, the recall,

the p-value, the average gain compared to AdaBoost, the speed of convergence

and the sensitivity to noise.

Mining Complex Data

Search WWH ::

Custom Search

Home