Graphics Reference
In-Depth Information
Fig. 1.1 KDD process
provide a complete explanation on how these techniques operate with detail, but to
stay focused on the data preprocessing step.
Figure 1.2 shows a division of the main DM methods according to two methods
of obtaining knowledge: prediction and description. In the following, we will give
a short description for each method, including references for some representative
and concrete algorithms and major considerations from the point of view of data
preprocessing.
Within the prediction family of methods, two main groups can be distinguished:
statistical methods and symbolic methods [ 4 ]. Statistical methods are usually char-
acterized by the representation of knowledge through mathematical models with
computations. In contrast, symbolic methods prefer to represent the knowledge by
means of symbols and connectives, yielding more interpretable models for humans.
The most applied statistical methods are:
￿
Regression Models: being the oldest DMmodels, they are used in estimation tasks,
requiring the class of equation modelling to be used [ 24 ]. Linear, quadratic and
logistic regression are the most well known regression models in DM. There are
basic requirement that they impose on the data. Among them, the use of numerical
attributes are not designed for dealing with missing svalues, they try to fit outliers
to the models and use all the features independently whether or not they are useful
or dependent on one another.
 
Search WWH ::




Custom Search