Dealing with Noisy Data - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

of the baseline methods at all noise levels. These differences are also statistically

significant as reflected by the low p-values. Only at some very low noise levels—5%

and 10% for C4.5 and 5% for 5-NN - the results between the OVO and the non-OVO

version are statistically equivalent, but notice that the OVO decomposition does not

hinder the results, simply the loss is not lower.

These results also show that OVO achieves more accurate predictions when deal-

ing with pairwise class noise , however, it is not so advantageous with C4.5 or

RIPPER as with 5-NN in terms of robustness when noise only affects one class.

For example, the behavior of RIPPER with this noise scheme can be related to the

hierarchical way in which the rules are learned: it starts learning rules of the class

with the lowest number of examples and continues learning those classes with more

examples. When introducing this type of noise, RIPPER might change its training

order, but the remaining part of the majority class can still be properly learned, since

it now has more priority. Moreover, the original second majority class, now with

noisy examples, will probably be the last one to be learned and it would depend on

how the rest of the classes have been learned. Decomposing the problem with OVO,

a considerable number of classifiers will have a notable quantity of noise—those of

the majority and the second majority classes—and hence, the tendency to predict the

original majority class decreases—when the noise level is high, it strongly affects

the accuracy, since the majority has more influence on it.

In contrast with the rest of noise schemes, with pairwise noise scheme, all the

data sets have different real percentages of noisy examples at the same noise level of

x %. This is because each data set has a different number of examples of the majority

class, and thus a noise level of x % does not affect all the data sets in the same way.

In this case, the percentage of noisy examples with a noise level of x % is computed

as

(

·

N maj )/

100, where N maj is the percentage of examples of the majority class.

x

5.5.5.2 Second Scenario: Data Sets with Attribute Noise

In this section, the performance and robustness of the classification algorithms using

OVO in comparison to its non-OVO version when dealing with data with attribute

noise are analyzed. The test accuracy, RLA results and p-values of each classification

algorithm at each noise level are shown in Table 5.9 .

In the case of uniform attribute noise it can be pointed out that the test accuracy

of the methods using OVO is always statistically better at all the noise levels. The

RLA values of the methods using OVO are lower than those of the baseline methods

at all noise levels—except in the case of C4.5 with a 5% of noise level. Regarding

the p-values, a clear tendency is observed, the p-value decreases when the noise level

increases with all the algorithms. With all methods—C4.5, RIPPER and 5-NN—the

p-values of the RLA results at the lowest noise levels (up to 20-25%) show that

the robustness of OVO and non-OVO methods is statistically equivalent. From that

point on, the OVO versions statistically outperform the non-OVO ones. Therefore,

the usage of OVO is clearly advantageous in terms of accuracy and robustness when

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home