Database Reference
In-Depth Information
outcome of interest, allowing data miners to focus only on the information that
matters.
Field screening models are usually used in the data preparation phase of a
data mining project in order to perform the following tasks:
• Evaluate the quality of potential predictors. They incorporate specific criteria
to identify inadequate predictors: for instance, predictors with an extensive
percentage of missing (null) values, continuous predictors which are constant
or have little variation, categorical predictors with too many categories or with
almost all records falling in a single category.
• Rank predictors according to their predictive power. The influence of each
predictor on the target field is assessed and an importance measure is calculated.
Predictors are then sorted accordingly.
• Filter out unimportant predictors. Predictors unrelated to the target field are
identified. Analysts have the option to filter them out, reducing the set of input
fields to those related to the target field.
PREDICTING CONTINUOUS OUTCOMES WITH ESTIMATION MODELING
Estimation models, also referred to as regression models, deal with continuous
numeric outcomes. By using linear or nonlinear functions they use the input fields
to estimate the unknown values of a continuous target field.
Estimation techniques can be used to predict attributes like the following:
• The expected balance of the savings accounts of bank customers in the near
future.
• The estimated volume of traffic for new customers of a mobile telephony
network operator.
• The expected revenue from a customer for the next year.
A dataset with historical data and known values of the continuous output
is required for training the model. A mapping function is then identified that
associates the available inputs to the output values. These models are also referred
to as regression models, after the well-known and established statistical technique
of ordinary least squares regression (OLSR), which estimates the line that best
fits the data and minimizes the observed errors, the so-called least squares
Search WWH ::




Custom Search