Information Technology Reference
In-Depth Information
by the three cases of the cHyper problem. As stated in the previous section, it is worth
observing the poor performances of Fix and SEA in the case of evolving data. These
obsevations are further validated by the results obtained with the Stagger problem,
that essentially follow the ones proposed in Table 3.
Finally, Table 4 outlines the resources required by the systems. The memory require-
ments were tested using NetBeans 6.8 Profiler. We can state that Single requires less
memory than ensemble methods, which need a quantity of memory that is essentially
linear with respect to the number of classifiers stored in the ensemble. The different na-
ture of the two classes of systems influences this value. The average memory required
by our system is slightly higher than the others, since our system manages two differ-
ent structures, as suggested at the end of Section 3.3. The run time behavior confirms
this trend. In this case the drift detection approach influences the execution time of
a method. Let us compare the bagging method Oza with respect to DWM , SEA 64 and
ASE . These tests highlight that incremental single model systems are faster than ensem-
ble ones, since they have to update only one model. On the contrary, considering the
accuracy, single model systems rarely provide best average values. Finally, Oza guar-
antees an appreciable reliability with every data set, but its execution time is definitely
higher than the others.
Ta b l e 4 . cHyper c time and memory required
decision tree
naıve Bayes
avg used run time
avg used run time
heap (KB)
(sec.) heap (KB)
(sec.)
ASE
9276
82.40
7572
27.42
9233
80.80
7894
27.45
SE
8507
47.54
5317
23.82
Fix 64
7980
152.07
5371
97.76
SEA 64
DWM 64
5111
77.56
5137
21.21
Oza 64
10047
393.93
6664
290.24
5683
11.54
5399
8.26
Single
Figure 6a shows the results obtained considering the Cyclic problem. The latter are
presented considering the naıve Bayes approach and analyzing different rates between
the chunk size and the elements to classify. As shown in Figure 6a, even in this case, our
ASE approach is in-line with the SE 0 . 1 and better than the others. Since this problem
presents recurring concepts, our approach can exploit the selective ensemble better than
the others, since some models which are currently out of context are not deleted by
the system, but simply disabled. If a concept becomes newly valid, the model can be
reactivated. This behavior is still valid, even in the case of the adaptive approach.
We conclude this section, proposing the results obtained analyzing the KddCup99
problem, and considering the decision tree approach. In this case, only an execution is
run considering the whole data set. As shown is Figure 6b, the approaches employing
an advanced method to keep track of concept drift propose an accuracy in line with the
ones obtained by Aggarwal et al. in [3]. Even in this case, ASE proposes a performance
comparable with SE 0 . 1 , showing that the adaptive behavior guarantees a good level of
reliability. The run time requirements needed for analysing KddCup99 dataset are in line
with the ones proposed in Table 4.
 
Search WWH ::




Custom Search