Segmentation of Continuous Data Streams Based on a Change Detection Methodology - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

4.3.3 Results - “Stock” Data Set

This section summarizes segmentation and analysis of the stock data set, which

was analyzed by the IFN algorithm. The main expectation for this dataset was that

significant changes would be observed over time, due to the segmentation of the

full data sets into disjoint segments. This indicates that the full data stream can be

evaluated as several disjoint data sets, and for each of them, a separate underlying

model can be evaluated and implemented.

The full data set holds information about stocks in 5462 records. As there is no

predefined way to segment the given data stream, three different ways of

segmentation were implemented and evaluated based on the change-detection

methodology.

The logical way of segmenting the data stream is using any “time field” as an

indication of the accumulating knowledge, which was added incrementally to the

database. Table 4.6 describes how the segments of data sets are divided according

to the incremental date field in the stock data set.

Table 4.5. Segmentation of the “stock” data set.

Trial num.

Segment num.

Record interval

1

[1,1000]

2

[1001,2000]

1

3

[2001,3000]

4

[3001,4000]

5

[4001,5000]

1

[1,1500]

2

[1501,3000]

2

3

[3001,4500]

4

[4501,5000]

1

[1,2000]

2

[2001,2500]

3

[2501,3000]

3

4

[3001,3500]

5

[3501,4000]

6

[4001,4500]

7

[4501,5000]

The first trial is a partition of 5000 accumulated records into five equally sized

data sets. Figure 4.3 describes the outcome of applying the change-detection

methodology to these segments of data.

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home