Databases Reference
In-Depth Information
7 Prelimilary Experimental Results
We conducted experiments on several synthetic data streams. The main goals
of our experimentation include, (1) testing the effectiveness and e ciency of
our approach to capture concepts and detect concept drifts for continuously
multi-dimensional streaming data, and (2) testing the space-e ciency of our
storage structure. In order to simulate the functionality of the data stream,
we designed a data stream server that will constantly stream the chosen data
set to a receiving client.
7.1 Synthetic Datastreams
We created the streaming data in the way that it is steaming concepts one
by one along the time. Each concept is composed of several dense areas and
mixed with noise data. When streaming a concept, points involved in this
concept are randomly picked and send out one by one. Some characteristics
of the synthetic data we use are presented in Table 1.
7.2 Pre-Learning Parameters
For each synthetic data stream, the first concept is picked as the training data
based on which the dense threshold and the resolution of the grid are learned
through applying genetic algorithm. In Table 2, we list the learned parameter
values for each data stream.
7.3 Detecting Concepts and Concept Drift
For a data stream, at any given time, a TSP trie for the current concept cy-
cle is maintained in the memory. As described earlier, the root of the TSP
Tabl e 1 . Characteristics of the synthetic data streams
Stream no.
Ave. number of
Dimensions
Average data size
clusters per concept
per concept
1
3.33
8
0.43 MB
2
3.6
8
0.45 MB
3
3.42
8
0.42 MB
4
3.22
8
0.45 MB
Tabl e 2 . Pre-learned parameter values
No.
Cell Resolution(Dimension 1-8)
θ n
1
2
3
4
5
6
7
8
1
0.06
0.28
0.73
0.35
0.39
0.76
0.59
0.58
66
2
0.93
0.02
0.11
0.4
0.91
0.80
0.83
0.62
77
3
0.65
0.60
0.81
0.24
0.24
0.06
0.34
0.38
90
4
0.01
0.48
0.129
0.08
0.15
0.25
0.14
0.13
98
 
Search WWH ::




Custom Search