Trends in Data Mining and Knowledge Discovery - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

is given in [33]. One approach for dealing with scalability of DM tools is

connected with the notion of meta-mining. Meta-mining generates meta-

knowledge from knowledge generated by data mining tools [67]. It is done by

dividing data into subsets, generating data models for these subsets, and

generating meta-knowledge from these data models. In this approach small

data models are processed as input data instead of huge amounts of the

original data, which greatly reduces computational overhead [46], [48].

5. Evaluation of the discovered knowledge

This step includes understanding the results by owners of the data who check

whether the new information is truly novel and interesting, and checking the

impact of the discovered knowledge. Only the approved models (results of

applying many data mining tools and preprocessing methods) are kept. The

entire DMKD process may be revisited to identify which alternative actions

could be taken to improve the results.

6. Using the discovered knowledge

This step is entirely in the hands of the owners of the database. It consists of

planning where and how the discovered knowledge will be used. The

application area in the current domain should be extended to other domains

within an organization. A plan to monitor the implementation of the

discovered knowledge should be created and the entire project documented.

The six-step DMKD process model described above is visualized in Fig. 1.3.

Important parts of the process are its iterative and interactive aspects. The

feedback loops are necessary since any changes and decisions made in one of the

steps can result in changes in later steps. The model uses several such feedback

mechanisms:

x from Step 2 to Step 1 because additional domain knowledge may be needed to

better understand the data.

x from Step 3 to Step 2 because additional or more specific information about

the data may be needed before choosing specific data preprocessing

algorithms (for instance, data transformation or discretization).

x from Step 4 to Step 1 when the selected DM tools do not generate satisfactory

results, and thus the project goals must be modified.

x from Step 4 to Step 2 in a situation when data was misinterpreted, causing the

failure of a DM tool (e.g., data were misrecognized as continuous and

discretized in Step 3). The most common scenario is when it is unclear which

DM tool should be used because of poor understanding of the data.

x from Step 4 to Step 3 to improve data preparation because of the specific

requirements of the used DM tool, which may not have been known during

the data preparation step.

x from Step 5 to Step 1 when the discovered knowledge is not valid. There are

several possible sources of such a situation: incorrect understanding or

interpretation of the domain or incorrect design or understanding of problem

restrictions, requirements, or goals. In these cases the entire DMKD process

needs to be repeated.

x from Step 5 to Step 4 when the discovered knowledge is not novel, interesting,

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home