Graphics Reference
In-Depth Information
analysis of large databases. A key aspect that characterizes the KDD process is the
way it is divided into stages according the agreement of several important researchers
in the topic. There are several methods available to make this division, each with
advantages and disadvantages [ 16 ]. In this topic, we adopt a hybridization widely
used in recent years that categorizes these stages into six steps:
1. Problem Specification: Designating and arranging the application domain, the
relevant prior knowledge obtained by experts and the final objectives pursued by
the end-user.
2. Problem Understanding: Including the comprehension of both the selected data
to approach and the expert knowledge associated in order to achieve high degree
of reliability.
3. Data Preprocessing: This stage includes operations for data cleaning (such as
handling the removal of noise and inconsistent data), data integrationdata integra-
tion (where multiple data sources may be combined into one), data transformation
(where data is transformed and consolidated into forms which are appropriate for
specific DM tasks or aggregation operations) and data reduction, including the
selection and extraction of both features and examples in a database. This phase
will be the focus of study throughout the topic.
4. Data Mining: It is the essential process where the methods are used to extract
valid data patterns. This step includes the choice of the most suitable DM task
(such as classification, regression, clustering or association), the choice of the
DM algorithm itself, belonging to one of the previous families. And finally, the
employment and accommodation of the algorithm selected to the problem, by
tuning essential parameters and validation procedures.
5. Evaluation: Estimating and interpreting the mined patterns based on interesting-
ness measures.
6. Result Exploitation: The last stage may involve using the knowledge directly;
incorporating the knowledge into another system for further processes or simply
reporting the discovered knowledge through visualization tools.
Figure 1.1 summarizes the KDD process and reveals the six stages mentioned
previously. It is worth mentioning that all the stages are interconnected, showing that
the KDD process is actually a self-organized scheme where each stage conditions
the remaining stages and reverse path is also allowed.
1.2 Data Mining Methods
A large number of techniques for DM are well-known and used in many applications.
This section provides a short review of selected techniques considered the most
important and frequent in DM. This review only highlights some of the main features
of the different techniques and some of the influences related to data preprocessing
procedures presented in the remaining chapters of this topic. Our intention is not to
 
Search WWH ::




Custom Search