Information Technology Reference
In-Depth Information
SEA : A complete description and evaluation of this system can be found in [30]. In
this case classifiers are not deleted indiscriminately. Their management is based on
a weight measure related to model reliability. This method represents a special case
of our selective ensemble, where only one level is defined.
DWM : This system is introduced in [24,25]. The approach implemented here considers
a set of data as input to the algorithm, and a batch classifier as the basic one. A
weight management is introduced, but differently from SEA , every classifier has
a weight associated with it, when it is created. Every time the classifier makes a
mistake, its weight decreases.
Oza : This system implements the online bagging method of Oza and Russell [26] with
the addition of the ADWIN technique [5] as a change detector and as estimator of the
weight of the boosting method.
Single : This approach employs an incremental single model with EDDM [13,4] tech-
niques for drift detection. Both Oza and Single were tested using ASHoeffdingTree
and naıve Bayes models available in MOA.
4.3 Results
All the experiments were run on a PC with Intel E8200 DualCore with 4Gb of RAM,
employing Linux Fedora 10 (kernel 2.6.27) as operating system. Our experiments con-
sider a frame with 8 levels of capacity 3. Every high-order snapshot is built by adding
2 snapshots. This frame size is large enough to consider snapshots that represent big
portions of data at higher-levels. For each level, an ensemble of 8 classifiers was used.
The tests were conducted comparing the use of the naıve Bayes (NB), and the deci-
sion tree (DT) as base classifiers. In all the cases, we compare our Selective Ensemble
( SE ) (with fixed model activation threshold set to 0 . 1and0 . 25) with our Adaptive Se-
lection Ensemble ASE . For each data generator, a collection of 100 training sets (and
corresponding test sets) are randomly generated with respect to the features outlined in
Table 1. Every system is run, and the average accuracy and 95% of interval confidence
are reported. Each test consists of a set of 100 observations. All the statistics reported
are computed according to the results obtained.
Results with Stable Data Sets. The results obtained with stable data sets confirm
that the drift detection approach provided by each system does not heavily influence
its overall accuracy. With LED24 and Hyper problems, all the systems reach a quite
accurate result. Table 2 reports the results obtained with Hyper data sets using the naıve
Bayes approach. These results can be compared with the ones provided in Table 3 in
Section 4.3, where the concept drifting problem is added to the same type of data.
It is worth observing that there are no significant differences between the results ob-
tained by SE approach, varying the model activation threshold. The new ASE approach
provides a result in line with the best ones. The adaptive behavior mechanism does not
negatively influence the reliability of the system in the case of stable data streams. On
the contrary, the new approach enables a better ensemble exploitation.
Moreover, Table 2 highlights that Single model requires a large quantity of data to
provide a good performance. Finally, Fix 64 and SEA 64 provide good results that, com-
pared with the ones obtained by the same systems analyzing the cHyper and Cyclic
 
Search WWH ::




Custom Search