Database Reference
In-Depth Information
The data to be analyzed may reside in well-organized data marts and data
warehouses or may be extracted from various unstructured data sources. A data
mining procedure has many stages. It typically involves extensive data management
before the application of a statistical or machine learning algorithm and the
development of an appropriate model. Specialized software packages have been
developed (data mining tools), which can support the whole data mining procedure.
Data mining models consist of a set of rules, equations, or complex ''transfer
functions'' that can be used to identify useful data patterns, understand, and predict
behaviors. They can be grouped into two main classes according to their goal, as
follows.
SUPERVISED/PREDICTIVE MODELS
In supervised, or predictive, directed, or targeted modeling, the goal is to predict
an event or estimate the values of a continuous numeric attribute. In these models
there are input fields or attributes and an output or target field. Input fields are
also called predictors because they are used by the model to identify a prediction
function for the output field. We can think of predictors as the X part of the
function and the target field as the Y part, the outcome.
The model uses the input fields which are analyzed with respect to their effect
on the target field. Pattern recognition is ''supervised'' by the target field. Relation-
ships are established between input and output fields. An input-output mapping
''function'' is generated by the model, which associates predictors with the output
and permits the prediction of the output values, given the values of the input fields.
Predictive models are further categorized into classification and estimation
models:
Classification or propensity models: In these models the target groups or
classes are known from the start. The goal is to classify the cases into these
predefined groups; in other words, to predict an event. The generated model
can be used as a scoring engine for assigning new cases to the predefined classes.
It also estimates a propensity score for each case. The propensity score denotes
the likelihood of occurrence of the target group or event.
Estimation models: These models are similar to classification models but with
one major difference. They are used to predict the value of a continuous field
based on the observed values of the input attributes.
UNSUPERVISED MODELS
In unsupervised or undirected models there is no output field, just inputs. The
pattern recognition is undirected; it is not guided by a specific target attribute.
Search WWH ::




Custom Search