Capturing Concepts and Detecting Concept-Drift from Potential Unbounded, Ever-Evolving and High-Dimensional Data Streams - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

7 Prelimilary Experimental Results

We conducted experiments on several synthetic data streams. The main goals

of our experimentation include, (1) testing the effectiveness and e ciency of

our approach to capture concepts and detect concept drifts for continuously

multi-dimensional streaming data, and (2) testing the space-e ciency of our

storage structure. In order to simulate the functionality of the data stream,

we designed a data stream server that will constantly stream the chosen data

set to a receiving client.

7.1 Synthetic Datastreams

We created the streaming data in the way that it is steaming concepts one

by one along the time. Each concept is composed of several dense areas and

mixed with noise data. When streaming a concept, points involved in this

concept are randomly picked and send out one by one. Some characteristics

of the synthetic data we use are presented in Table 1.

7.2 Pre-Learning Parameters

For each synthetic data stream, the first concept is picked as the training data

based on which the dense threshold and the resolution of the grid are learned

through applying genetic algorithm. In Table 2, we list the learned parameter

values for each data stream.

7.3 Detecting Concepts and Concept Drift

For a data stream, at any given time, a TSP trie for the current concept cy-

cle is maintained in the memory. As described earlier, the root of the TSP

Tabl e 1 . Characteristics of the synthetic data streams

Stream no.

Ave. number of

Dimensions

Average data size

clusters per concept

per concept

1

3.33

8

0.43 MB

2

3.6

8

0.45 MB

3

3.42

8

0.42 MB

4

3.22

8

0.45 MB

Tabl e 2 . Pre-learned parameter values

No.

Cell Resolution(Dimension 1-8)

θ n

1

2

3

4

5

6

7

8

1

0.06

0.28

0.73

0.35

0.39

0.76

0.59

0.58

66

2

0.93

0.02

0.11

0.4

0.91

0.80

0.83

0.62

77

3

0.65

0.60

0.81

0.24

0.06

0.34

0.38

90

4

0.01

0.48

0.129

0.08

0.15

0.25

0.14

0.13

98

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home