Databases Reference
In-Depth Information
used by the modeling process. There should not be any row identifiers or other
attributes included. Therefore, if your dataset contains unneeded attributes,
before starting the modeler, create a derived set containing only the attributes
that you want to include. See Chapters 2 and 3 for details on how this is done.
Tutorial
VisMiner supports the two main objectives of regression analysis in data
mining:
1. Construct a model that accurately predicts continuous numeric output
values using input datasets that were not used to build the model.
2. Learn more about the relationships between potential predictors and the
target output attribute.
With respect to the first objective, in regression analysis, the best overall
measure of model performance is R 2 . The recommended method for comparing
alternative models and assessing their ability to generalize is to compare the R 2
values of the models when applied to test or validation datasets deemed to be
representative of the datasets to which the model will be used.
The task of building that best model is one of choosing the right combination of
input attributes, then applying those input attributes to a modeling algorithm that
will best meet the objectives - prediction accuracy and model understandability.
VisMiner implements three algorithms for regression analysis: linear regres-
sion, polynomial regression, and artificial neural networks (ANN).
From the perspective of interpretability, linear regression is the simplest of
the three, followed by polynomial regression and ANN regression. ANN based
models are by far the most difficult to evaluate, to understand, and to visualize
their inner workings.
On the other hand, ANNs are the most powerful with respect to their ability to
fit the data and thus generate accurate predictions. The ANN algorithm is the
only one of the three that can detect interactions between input attributes.
ANNs and polynomial regression modelers are the two that can do non-linear
fits of the data. In order to fit non-linear inputs to linear regression models, user
defined and initiated transformations of attributes are required prior to submis-
sion of the dataset to the modeler.
In conducting a regression analysis using VisMiner, it is recommended that
all three algorithms be applied. Although, in the end, the ANN will likely
generate the best model, the linear and polynomial regression modelers can
assist in choosing predictor attributes, gaining insights into their contributions
to the predicted output, and providing benchmarks against which the perform-
ance of competing models can be judged.
 
Search WWH ::




Custom Search