Databases Reference
In-Depth Information
Fig. 1. Descriptions of concept C1 and C2
All the dense areas in the n-dimension space can be viewed as the approximate
description for c1. When the data captured is streaming, it surely takes time
for the description of the concept to be identifiable. During this time period,
some dense areas may appear earlier, some may appear later. Let's assume
we also capture the data within the concept cycle immediately following the
concept cycle c 1andmarkitas c 2. Again we can approximately describe
c 2 by using its dense areas (see Fig. 1). Now, the question that needs to be
addressed is how to identify the boundary between c 1and c 2? In other words,
when the data is streaming, how can we know whether it is still in the forming
period of cycle c 1 or it has already entered into cycle c 2? In order to address
this problem, we first view the dense area in a concept cycle as composed
of a group of adjacent dense cells. The size of the dense cell is learned from
static training data extracted from the corresponding data stream, which is
discussed in Sect. 4. When the data is streaming, data points keep falling into
the corresponding cells, making some of the cells hold the number of points
exceeding certain threshold θ n ( θ n is learned from static training data as well)
and become dense. We stamp each time point when a cell becomes dense. In
this way, we maintain a time series of timestamps that mark the occurrences
of new dense cells. Whenever a new timestamp is added to this time series,
a linear regression is conducted to predict the next timestamp t pred .When
the real next timestamp tnext is marked, we calculate the difference between
tnext and tpred. If ( t next
t pred ) t , we view the new dense cell formed at
Search WWH ::




Custom Search