Data Streams Classification: A Selective Ensemble with Adaptive Behavior - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

by the three cases of the cHyper problem. As stated in the previous section, it is worth

observing the poor performances of Fix and SEA in the case of evolving data. These

obsevations are further validated by the results obtained with the Stagger problem,

that essentially follow the ones proposed in Table 3.

Finally, Table 4 outlines the resources required by the systems. The memory require-

ments were tested using NetBeans 6.8 Profiler. We can state that Single requires less

memory than ensemble methods, which need a quantity of memory that is essentially

linear with respect to the number of classifiers stored in the ensemble. The different na-

ture of the two classes of systems influences this value. The average memory required

by our system is slightly higher than the others, since our system manages two differ-

ent structures, as suggested at the end of Section 3.3. The run time behavior confirms

this trend. In this case the drift detection approach influences the execution time of

a method. Let us compare the bagging method Oza with respect to DWM , SEA 64 and

ASE . These tests highlight that incremental single model systems are faster than ensem-

ble ones, since they have to update only one model. On the contrary, considering the

accuracy, single model systems rarely provide best average values. Finally, Oza guar-

antees an appreciable reliability with every data set, but its execution time is definitely

higher than the others.

Ta b l e 4 . cHyper c time and memory required

decision tree

naıve Bayes

avg used run time

heap (KB)

(sec.) heap (KB)

(sec.)

ASE

9276

82.40

7572

27.42

9233

80.80

7894

27.45

8507

47.54

5317

23.82

Fix 64

7980

152.07

5371

97.76

SEA 64

DWM 64

5111

77.56

5137

21.21

Oza 64

10047

393.93

6664

290.24

5683

11.54

5399

8.26

Single

Figure 6a shows the results obtained considering the Cyclic problem. The latter are

presented considering the naıve Bayes approach and analyzing different rates between

the chunk size and the elements to classify. As shown in Figure 6a, even in this case, our

ASE approach is in-line with the SE 0 . 1 and better than the others. Since this problem

presents recurring concepts, our approach can exploit the selective ensemble better than

the others, since some models which are currently out of context are not deleted by

the system, but simply disabled. If a concept becomes newly valid, the model can be

reactivated. This behavior is still valid, even in the case of the adaptive approach.

We conclude this section, proposing the results obtained analyzing the KddCup99

problem, and considering the decision tree approach. In this case, only an execution is

run considering the whole data set. As shown is Figure 6b, the approaches employing

an advanced method to keep track of concept drift propose an accuracy in line with the

ones obtained by Aggarwal et al. in [3]. Even in this case, ASE proposes a performance

comparable with SE 0 . 1 , showing that the adaptive behavior guarantees a good level of

reliability. The run time requirements needed for analysing KddCup99 dataset are in line

with the ones proposed in Table 4.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home