An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

They consist of neurons organized in layers. The input layer contains the predic-

tors or input neurons. The output layer includes the target field. These models

estimate weights that connect predictors (input layer) to the output. Models

with more complex topologies may also include intermediate, hidden layers,

and neurons. The training procedure is an iterative process. Input records,

with known outcomes, are presented to the network and model prediction is

evaluated with respect to the observed results. Observed errors are used to

adjust and optimize the initial weight estimates. They are considered as opaque

or ''black box'' solutions since they do not provide an explanation of their predic-

tions. They only provide a sensitivity analysis, which summarizes the predictive

importance of the input fields. They require minimum statistical knowledge

but, depending on the problem, may require a long processing time for training.

• Support vector machine(SVM): SVM is a classification algorithm that can

model highly nonlinear, complex data patterns and avoid overfitting, that

is, the situation in which a model memorizes patterns only relevant to the

specific cases analyzed. SVM works by mapping data to a high-dimensional

feature space in which records become more easily separable (i.e., separated

by linear functions) with respect to the target categories. Input training data

are appropriately transformed through nonlinear kernel functions and this

transformation is followed by a search for simpler functions, that is, linear

functions, which optimally separate records. Analysts typically experiment with

different transformation functions and compare the results. Overall SVM is an

effective yet demanding algorithm, in terms of memory resources and processing

time. Additionally, it lacks transparency since the predictions are not explained

and only the importance of predictors is summarized.

• Bayesian networks: Bayesian models are probability models that can be

used in classification problems to estimate the likelihood of occurrences. They

are graphical models that provide a visual representation of the attribute

relationships, ensuring transparency, and an explanation of the model's rationale.

Evaluation of Classification Models

Before applying the generated model in new records, an evaluation procedure is

required to assess its predictive ability. The historical data with known outcomes,

which were used for training the model, are scored and two new fields are derived:

the predicted outcome category and the respective confidence score, as shown in

Table 2.2, which illustrates the procedure for the simplified example presented

earlier.

In practice, models are never as accurate as in the simple exercise presented

here. There are always errors and misclassified records. A comparison of the

predicted to the actual values is the first step in evaluating themodel's performance.

Search WWH ::

Custom Search

Home