Database Reference
In-Depth Information
hand, the research in OLAP (on-line analytical processing) and data warehouses
peaked around 1999. Some of the trends that initially had the greatest impact on
the DM field began to decline because most of the issues concerned with those
areas may have been solved, and thus the attention shifted toward new areas and
applications. Moreover, new trends emerged that have great potential to benefit
the DMKD field, like XML and related technologies, database products that
incorporate DM tools, and new developments in the design and implementation of
the DMKD process. Among these, XML technology may have the greatest
influence since it helps to tie DM with other technologies like databases or e-
commerce. XML can also help to standardize the I/O procedures, which will help
to consolidate the DM market and carry out the DMKD process, see Fig. 1.2(a).
The other important DMKD issue is the relationship between theoretical DM
research and DM applications; see Fig. 1.2(b). The number of DM application
papers increased rapidly over the last few years. The growth rate of theoretical
research was slower initially but accelerated around 1998 and then started to slow
down. This trend may be interpreted that more attention has been given to
practical applications, possibly because of the increased funding levels. This
situation, however, calls for a more balanced approach because applications need
to be well grounded in theory if real progress is to be made. We need new DM
tools that can handle huge amounts of textual data generated by the Internet, tools
to extract knowledge from hypertext and images as often encountered in biology
and medicine.
In short, the DMKD is an exponentially growing field with strong emphasis
on applications.
1.2 Six-Step Knowledge Discovery and Data Mining
Process
The goal of designing a DMKD process model is to come up with a set of
processing steps that can be followed by practitioners when they execute their
DMKD projects. Such a process model should help to plan, work through, and
reduce the cost by detailing procedures to be performed in each of the steps. The
DMKD process model should provide a complete description of all the steps, from
problem specification to deployment of the results.
A useful DMKD process model must be validated in real-life applications. One
such initiative was taken by the CRISP-DM (CRoss-Industry Standard Process for
Data Mining) group [72], [26]. Their design was based on the study supported by
several European companies (automotive, aerospace, telecommunication,
consultancy, insurance, data warehouse, developer of DM tools). The project
included two inseparable ingredients of any DMKD process: databases and DM
tools. Two companies (OHRA and DaimlerChrysler) provided large-scale
applications to validate the DMKD process model. The goal of the project was to
develop a DMKD process that would help to save project costs, shorten project
time, and adopt DM as a core part of the business. As a result, the six-step DM
process was developed: business understanding, data understanding, data
Search WWH ::




Custom Search