Data Streams Classification: A Selective Ensemble with Adaptive Behavior - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

where, given a class value c and an attribute a, freq j a , k , c i − 1 refers to the distri-

bution of the values of attribute a with class value c of the j-th snapshot of order i

−

1 .

Figure 2 shows the relation between snapshots and their order. The aim is to employ

a set of snapshots created directly from the stream to build new ones, representing

increasingly larger data windows, simply by summing the frequencies of their elements.

A high-order snapshot satisfies Property 1, since it has the same structure of a basic

one. Moreover, it further verifies Requirements 3, since the creation of a new high-

order snapshot is linear in the number of attributes and class values. The creation of

high-order snapshots does not imply any loss of information. This aspect guarantees

that a set of different size sliding windows is simultaneously managed by accessing

data stream only once, enabling the approach to consider every window as computed

directly from the stream.

From a snapshot, or a high-order one, the system extracts an approximated decision

tree, or employs the snapshot as naıve Bayes classifier directly.

3.2

The Frame

Snapshots are stored to maximize the number of elements for training classifiers. A

model mined from a small set of elements tends to be less accurate than the one ex-

tracted from a large data set. If this observation is obvious in “traditional” mining con-

texts, where training sets are accurately built to maximize the model reliability, in a

stream environment this is not necessarily true. Due to concept drifting, a model ex-

tracted from a large set of data can be less accurate than the one mined from a small

training set. The large data set can include mainly out-of-date concepts.

Snapshots are then stored and managed, based on their order, in a structure called

Frame . The order of a snapshot defines its level of time granularity. Conceptually sim-

ilar to Pyramidal Time Frame introduced by Aggarwal et al. in [1] and inherited by

logarithmic tilted-time window , our structure sorts snapshots based on the number of

elements from which a snapshot was created.

Definition 4 (Frame). Given a level value i, and a level capacity j, a frame is a func-

tion that, given a pair of indexes (x , y) returns a snapshot of order x and position y:

F i , j : ( x , y )

→

Snapshot x , y

where: x

∈{

0 ,..., i

−

1

}

and y

∈{

0 ,..., j

−

1

}

.

As shown in Figure 3, level 1 contains snapshots created directly from the stream. Up-

per levels use the snapshots of the layer immediately lower to create a new one. The

maximum number of snapshots available in the frame is constant in time and is defined

by the number of levels and the level capacity. For each layer, the snapshot are stored

with FIFO policy. The frame memory occupation is constant in time and is linear with

the number of snapshots storable in the structure.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home