Database Reference
In-Depth Information
The design of a framework for a knowledge discovery process is an important
issue. Several researchers described a series of steps that constitute the KD
process; they range from very simple models, incorporating few steps that usually
include data collection and understanding, data mining, and implementation, to
more sophisticated models like the nine-step model proposed by Fayyad et al. [31].
In this chapter we describe the six-step DMKD process model [18], [19]. The
advantage of this model is that it is based on the industry-initiated study that led to
the development of an industry- and tool-independent DM process model [26] and
has been successfully used in medical applications [18], [44], [47], [61].
1.1.1. XML: Key to Unlocking Data Mining and Knowledge
Discovery
One of the technologies that can help in carrying out the DMKD process is XML
(eXtensible Markup Language); a standard proposed by the WWW Consortium
[12]. It is a subset of SGML that uses custom-defined tags [42]. XML allows for
description and storage of structured or semistructured data and their relationships.
One of the most important features of XML is that it can be used to exchange data
in a platform-independent way. XML is easy to use with many off-the-shelf tools
available for automatic processing of XML. From the DMKD process point of
view XML is a crucial technology as it helps to:
x standardize communication between diverse DM tools and databases. This
may result in a new generation of DM tools that can communicate with a
number of different database products.
x build standard data repositories that share data between different DM tools
and work on different software platforms. This may help to consolidate the
DMKD market and open it for new applications.
x implement communication protocols between the DM tools. This may result
in development of DM toolboxes [45] that consist of different DM tools,
developed by different companies, but that are able to communicate and
provide protocols to extract consolidated, more understandable, accurate, and
easily applicable knowledge.
x provide a framework for integration of and communication between different
DMKD steps. For instance, the information collected during the domain and
data understanding steps can be stored as XML documents. They can then be
used in the data preparation and data mining steps as a source of information
that can be accessed automatically, across platforms and across tools. In
addition, the extracted knowledge can be stored using XML and PMML
(Predictive Model Markup Language) documents. This may enable
automation of sharing of discovered knowledge between diverse domains and
tools that use it, as long as they are XML- and PMML-compliant.
Because the DMKD is a very complex process, which includes DM as one of
its steps, the importance of XML's utility in automating and consolidating the
DMKD process cannot be overstated; it makes it platform- and tool-independent.
A number of other XML goals defined by the W3C, like support of a variety of
Search WWH ::




Custom Search