Database Reference
In-Depth Information
line. It requires some statistical experience and since it is sensitive to possible
violations of its assumptions it may require specific data examination and processing
before building. The final model has the intuitive form of a linear function
with coefficients denoting the effect of predictors on the outcome measure.
Although transparent, it has inherent limitations that may affect its predictive
performance in complex situations of nonlinear relationships and interactions
between predictors.
Nowadays, traditional regression is not the only available estimation tech-
nique. New techniques, with less stringent assumptions and which also capture
nonlinear relationships, can also be employed to handle continuous outcomes.
More specifically, neural networks, SVM, and specific types of decision trees, such
as Classification and Regression Trees and CHAID, can also be employed for the
prediction of continuous measures.
The data setup and the implementation procedure of an estimation model
are analogous to those of a classification model. The historical dataset is used
for training the model. The model is evaluated with respect to its predictive
effectiveness, in a disjoint dataset, preferably of a different time period, with
known outcome values. The generated model is then deployed on unseen data to
estimate the unknown target values.
The model creates one new field when scoring: the estimated outcome value.
Estimation models are evaluated with respect to the observed errors: the deviation,
the difference between the predicted and the actual values. Errors are also called
residuals.
A large number of residual diagnostic plots andmeasures are usually examined
to assess the model's predictive accuracy. Error measures typically examined
include:
• Correlation measures between the actual and the predicted values, such as
the Pearson correlation coefficient. This coefficient is a measure of the linear
association between the observed and the predicted values. Values close to 1
indicate a strong relationship and a high degree of association between what was
predicted and what is really happening.
• The relative error. This measure denotes the ratio of the variance of the observed
values from those predicted by the model to the variance of the observed values
from their mean. It compares the model with a baseline model that simply
returns the mean value as the prediction for all records. Small values indicate
better models. Values greater than 1 indicate models less accurate than the
baseline model and therefore not useful.
• Mean error or mean squared error across all examined records.
• Mean absolute error (MAE).
• Mean absolute percent error (MAPE).
Search WWH ::




Custom Search