Database Reference
In-Depth Information
is given in [33]. One approach for dealing with scalability of DM tools is
connected with the notion of meta-mining. Meta-mining generates meta-
knowledge from knowledge generated by data mining tools [67]. It is done by
dividing data into subsets, generating data models for these subsets, and
generating meta-knowledge from these data models. In this approach small
data models are processed as input data instead of huge amounts of the
original data, which greatly reduces computational overhead [46], [48].
5. Evaluation of the discovered knowledge
This step includes understanding the results by owners of the data who check
whether the new information is truly novel and interesting, and checking the
impact of the discovered knowledge. Only the approved models (results of
applying many data mining tools and preprocessing methods) are kept. The
entire DMKD process may be revisited to identify which alternative actions
could be taken to improve the results.
6. Using the discovered knowledge
This step is entirely in the hands of the owners of the database. It consists of
planning where and how the discovered knowledge will be used. The
application area in the current domain should be extended to other domains
within an organization. A plan to monitor the implementation of the
discovered knowledge should be created and the entire project documented.
The six-step DMKD process model described above is visualized in Fig. 1.3.
Important parts of the process are its iterative and interactive aspects. The
feedback loops are necessary since any changes and decisions made in one of the
steps can result in changes in later steps. The model uses several such feedback
mechanisms:
x from Step 2 to Step 1 because additional domain knowledge may be needed to
better understand the data.
x from Step 3 to Step 2 because additional or more specific information about
the data may be needed before choosing specific data preprocessing
algorithms (for instance, data transformation or discretization).
x from Step 4 to Step 1 when the selected DM tools do not generate satisfactory
results, and thus the project goals must be modified.
x from Step 4 to Step 2 in a situation when data was misinterpreted, causing the
failure of a DM tool (e.g., data were misrecognized as continuous and
discretized in Step 3). The most common scenario is when it is unclear which
DM tool should be used because of poor understanding of the data.
x from Step 4 to Step 3 to improve data preparation because of the specific
requirements of the used DM tool, which may not have been known during
the data preparation step.
x from Step 5 to Step 1 when the discovered knowledge is not valid. There are
several possible sources of such a situation: incorrect understanding or
interpretation of the domain or incorrect design or understanding of problem
restrictions, requirements, or goals. In these cases the entire DMKD process
needs to be repeated.
x from Step 5 to Step 4 when the discovered knowledge is not novel, interesting,
Search WWH ::




Custom Search