Database Reference
In-Depth Information
by these outcomes. To answer this specific question, a preferred algorithm for
searching the optimal (or suboptimal) segmentation should rely on the following
assumptions and characteristics:
1. The complexity of the IFN algorithm for data mining like most classification
data-mining algorithms is O( n ). This should be taken into consideration.
2. There should be a limit on the number of possible segmentations and a minimal
size for each segment. Otherwise, the change-detection method would not be
useful due to insufficient information in each segment.
3. The choice of set of statistical parameters and weights should be considered.
4. An initial segmentation should be implemented (a simple partition of k
segments) and then explored into a relevant segmentation method by merging
and dividing segments.
5. The search algorithm should have a stopping criterion. Also, the search method
can be one of many search methods available (greedy, golden section, genetic
algorithms, etc.).
6. An automated segmentation procedure should have capabilities for user
interaction in the segmentation process (for example, see Nouira and Fouet
[31]).
4.3.4 Summary of Experiments
The following statements summarize the results:
1. In the “Dropout” database, the change-detection procedure reveals significant
changes in the extracted data-mining model, which was built from the data
accumulated during 2000, validating the base assumption for this database.
2. In the “Dropout” database, the expected error rate of using the same set of
rules, based on 1996-1999 on the year 2000 and beyond, would produce at least
22% error on average.
3. By applying the change-detection approach to the “stock” data set, we have
detected significant changes between succeeding segments and have compared
the quality of two alternative segmentations to provide a better segmentation of
the data set.
4. It is shown in the “stock” data set that a better segmentation of a data stream
can be chosen based on a statistical analysis and ranking schema.
5. Our change-detection methodology may be utilized as a basis for an automated
procedure aimed at finding the best segmentation of a given data stream but it
may be computationally expensive.
4.4 Conclusions and Future Work
As mentioned earlier, many data-mining models are constructed based on the
assumption that the data involved in building and verifying the model are the best
estimators of what will happen in the future.
Search WWH ::




Custom Search