Information Technology Reference
In-Depth Information
where, given a class value c and an attribute a, freq j a , k , c i 1 refers to the distri-
bution of the values of attribute a with class value c of the j-th snapshot of order i
1 .
Figure 2 shows the relation between snapshots and their order. The aim is to employ
a set of snapshots created directly from the stream to build new ones, representing
increasingly larger data windows, simply by summing the frequencies of their elements.
A high-order snapshot satisfies Property 1, since it has the same structure of a basic
one. Moreover, it further verifies Requirements 3, since the creation of a new high-
order snapshot is linear in the number of attributes and class values. The creation of
high-order snapshots does not imply any loss of information. This aspect guarantees
that a set of different size sliding windows is simultaneously managed by accessing
data stream only once, enabling the approach to consider every window as computed
directly from the stream.
From a snapshot, or a high-order one, the system extracts an approximated decision
tree, or employs the snapshot as naıve Bayes classifier directly.
3.2
The Frame
Snapshots are stored to maximize the number of elements for training classifiers. A
model mined from a small set of elements tends to be less accurate than the one ex-
tracted from a large data set. If this observation is obvious in “traditional” mining con-
texts, where training sets are accurately built to maximize the model reliability, in a
stream environment this is not necessarily true. Due to concept drifting, a model ex-
tracted from a large set of data can be less accurate than the one mined from a small
training set. The large data set can include mainly out-of-date concepts.
Snapshots are then stored and managed, based on their order, in a structure called
Frame . The order of a snapshot defines its level of time granularity. Conceptually sim-
ilar to Pyramidal Time Frame introduced by Aggarwal et al. in [1] and inherited by
logarithmic tilted-time window , our structure sorts snapshots based on the number of
elements from which a snapshot was created.
Definition 4 (Frame). Given a level value i, and a level capacity j, a frame is a func-
tion that, given a pair of indexes (x , y) returns a snapshot of order x and position y:
F i , j : ( x , y )
Snapshot x , y
where: x
∈{
0 ,..., i
1
}
and y
∈{
0 ,..., j
1
}
.
As shown in Figure 3, level 1 contains snapshots created directly from the stream. Up-
per levels use the snapshots of the layer immediately lower to create a new one. The
maximum number of snapshots available in the frame is constant in time and is defined
by the number of levels and the level capacity. For each layer, the snapshot are stored
with FIFO policy. The frame memory occupation is constant in time and is linear with
the number of snapshots storable in the structure.
 
Search WWH ::




Custom Search