An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

line. It requires some statistical experience and since it is sensitive to possible

violations of its assumptions it may require specific data examination and processing

before building. The final model has the intuitive form of a linear function

with coefficients denoting the effect of predictors on the outcome measure.

Although transparent, it has inherent limitations that may affect its predictive

performance in complex situations of nonlinear relationships and interactions

between predictors.

Nowadays, traditional regression is not the only available estimation tech-

nique. New techniques, with less stringent assumptions and which also capture

nonlinear relationships, can also be employed to handle continuous outcomes.

More specifically, neural networks, SVM, and specific types of decision trees, such

as Classification and Regression Trees and CHAID, can also be employed for the

prediction of continuous measures.

The data setup and the implementation procedure of an estimation model

are analogous to those of a classification model. The historical dataset is used

for training the model. The model is evaluated with respect to its predictive

effectiveness, in a disjoint dataset, preferably of a different time period, with

known outcome values. The generated model is then deployed on unseen data to

estimate the unknown target values.

The model creates one new field when scoring: the estimated outcome value.

Estimation models are evaluated with respect to the observed errors: the deviation,

the difference between the predicted and the actual values. Errors are also called

residuals.

A large number of residual diagnostic plots andmeasures are usually examined

to assess the model's predictive accuracy. Error measures typically examined

include:

• Correlation measures between the actual and the predicted values, such as

the Pearson correlation coefficient. This coefficient is a measure of the linear

association between the observed and the predicted values. Values close to 1

indicate a strong relationship and a high degree of association between what was

predicted and what is really happening.

• The relative error. This measure denotes the ratio of the variance of the observed

values from those predicted by the model to the variance of the observed values

from their mean. It compares the model with a baseline model that simply

returns the mean value as the prediction for all records. Small values indicate

better models. Values greater than 1 indicate models less accurate than the

baseline model and therefore not useful.

• Mean error or mean squared error across all examined records.

• Mean absolute error (MAE).

• Mean absolute percent error (MAPE).

Search WWH ::

Custom Search

Home