Information Technology Reference
In-Depth Information
example of data-based solutions. On the contrary, approximation algorithms such as
those introduced in [15,10] are examples of task-based techniques.
In the context of data streams classification, two main approaches can be outlined,
namely instance selection and ensemble learning . Very Fast Decision Trees (VFDT) [9]
with its improvements [22,14,27] for concept drifting reaction and numerical attributes
managing represent examples of instance selection methods. In particular, the Hoeffd-
ing bound guarantees that the split attribute chosen using n examples, is the same with
high probability as the one that would be chosen using an infinite set of examples. Last
et al. [8] propose another strategy using an info-fuzzy technique to adjust the size of
a data window. Ensemble learning employs multiple classifiers, extracting new mod-
els from scratch and deleting the out-of-date ones continuously. On-line approaches
for bagging and boosting are available in [26,7,5]. Different methods are available in
[30,34,29,25,11], where an ensemble of weighted-classifiers, including an adaptive ge-
netic programming boosting, as in [11], is employed to cope with concept drifting.
None of the two techniques can be assumed to be more appropriate than the other.
[5] provides a comparison between different techniques not only in terms of accuracy,
but also including computational features, such as memory and time required by each
system. By contrast, our approach proposes an ensemble learning that differs from the
cited methods since it is designed to concurrently manage different sliding windows,
enabling the use of a set of classifiers not necessarily contiguous and constant in time.
3
Adaptive Selective Ensemble
A detailed description of our system is available in [19,18]. In the following subsections,
we introduce only the main concepts of our approach highlighting the relations between
the requirements outlined in Section 2.1 and the aggregate structures introduced. The
proposed structures are primarily conceived to capture evolving data features, and guar-
antee data reduction at the same time. Ensuring a good trade-off between data reduction,
and a powerful representation of all the evolving data factors is a non-trivial task.
3.1
The Snapshot
The snapshot definition implies the naıve Bayes classifier directly. In our model, the
streaming training set is partitioned into chunks. Each data chunk is transformed into
an approximate more compact form, called snapshot .
Definition 2 (Snapshot). Given a data chunk of k elements, with A attributes and C
class values, a snapshot computes the distribution of the values of attribute a
A with
class value c, considering the last k elements arrived:
freq a , k , c ,
S k : C
×
A
a
A , c
C
The following properties are directly derived from Definition 2.
Property 1. Given a stream with C class values and A attributes, a snapshot is a set of
C
×
A tuples.
 
Search WWH ::




Custom Search