Information Technology Reference
In-Depth Information
by the three cases of the
cHyper
problem. As stated in the previous section, it is worth
observing the poor performances of
Fix
and
SEA
in the case of evolving data. These
obsevations are further validated by the results obtained with the
Stagger
problem,
that essentially follow the ones proposed in Table 3.
Finally, Table 4 outlines the resources required by the systems. The memory require-
ments were tested using NetBeans 6.8 Profiler. We can state that
Single
requires less
memory than ensemble methods, which need a quantity of memory that is essentially
linear with respect to the number of classifiers stored in the ensemble. The different na-
ture of the two classes of systems influences this value. The average memory required
by our system is slightly higher than the others, since our system manages two differ-
ent structures, as suggested at the end of Section 3.3. The run time behavior confirms
this trend. In this case the drift detection approach influences the execution time of
a method. Let us compare the bagging method
Oza
with respect to
DWM
,
SEA
64
and
ASE
. These tests highlight that incremental single model systems are faster than ensem-
ble ones, since they have to update only one model. On the contrary, considering the
accuracy, single model systems rarely provide best average values. Finally,
Oza
guar-
antees an appreciable reliability with every data set, but its execution time is definitely
higher than the others.
Ta b l e 4 .
cHyper
c
time and memory required
decision tree
naıve Bayes
avg used run time
avg used run time
heap (KB)
(sec.) heap (KB)
(sec.)
ASE
9276
82.40
7572
27.42
9233
80.80
7894
27.45
SE
8507
47.54
5317
23.82
Fix
64
7980
152.07
5371
97.76
SEA
64
DWM
64
5111
77.56
5137
21.21
Oza
64
10047
393.93
6664
290.24
5683
11.54
5399
8.26
Single
Figure 6a shows the results obtained considering the
Cyclic
problem. The latter are
presented considering the naıve Bayes approach and analyzing different rates between
the chunk size and the elements to classify. As shown in Figure 6a, even in this case, our
ASE
approach is in-line with the
SE
0
.
1
and better than the others. Since this problem
presents recurring concepts, our approach can exploit the selective ensemble better than
the others, since some models which are currently out of context are not deleted by
the system, but simply disabled. If a concept becomes newly valid, the model can be
reactivated. This behavior is still valid, even in the case of the adaptive approach.
We conclude this section, proposing the results obtained analyzing the
KddCup99
problem, and considering the decision tree approach. In this case, only an execution is
run considering the whole data set. As shown is Figure 6b, the approaches employing
an advanced method to keep track of concept drift propose an accuracy in line with the
ones obtained by Aggarwal et al. in [3]. Even in this case,
ASE
proposes a performance
comparable with
SE
0
.
1
, showing that the adaptive behavior guarantees a good level of
reliability. The run time requirements needed for analysing
KddCup99
dataset are in line
with the ones proposed in Table 4.