Segmentation of Continuous Data Streams Based on a Change Detection Methodology - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

by these outcomes. To answer this specific question, a preferred algorithm for

searching the optimal (or suboptimal) segmentation should rely on the following

assumptions and characteristics:

1. The complexity of the IFN algorithm for data mining like most classification

data-mining algorithms is O( n ). This should be taken into consideration.

2. There should be a limit on the number of possible segmentations and a minimal

size for each segment. Otherwise, the change-detection method would not be

useful due to insufficient information in each segment.

3. The choice of set of statistical parameters and weights should be considered.

4. An initial segmentation should be implemented (a simple partition of k

segments) and then explored into a relevant segmentation method by merging

and dividing segments.

5. The search algorithm should have a stopping criterion. Also, the search method

can be one of many search methods available (greedy, golden section, genetic

algorithms, etc.).

6. An automated segmentation procedure should have capabilities for user

interaction in the segmentation process (for example, see Nouira and Fouet

[31]).

4.3.4 Summary of Experiments

The following statements summarize the results:

1. In the “Dropout” database, the change-detection procedure reveals significant

changes in the extracted data-mining model, which was built from the data

accumulated during 2000, validating the base assumption for this database.

2. In the “Dropout” database, the expected error rate of using the same set of

rules, based on 1996-1999 on the year 2000 and beyond, would produce at least

22% error on average.

3. By applying the change-detection approach to the “stock” data set, we have

detected significant changes between succeeding segments and have compared

the quality of two alternative segmentations to provide a better segmentation of

the data set.

4. It is shown in the “stock” data set that a better segmentation of a data stream

can be chosen based on a statistical analysis and ranking schema.

5. Our change-detection methodology may be utilized as a basis for an automated

procedure aimed at finding the best segmentation of a given data stream but it

may be computationally expensive.

4.4 Conclusions and Future Work

As mentioned earlier, many data-mining models are constructed based on the

assumption that the data involved in building and verifying the model are the best

estimators of what will happen in the future.

Search WWH ::

Custom Search

Home