Databases Reference
In-Depth Information
Boundary plot - is most useful for detecting patterns with respect to political
boundaries. Current boundaries supported include US state, US county,
three-digit zip code, and five-digit zip code. If your data is summarized,
or may be summarized via aggregation, by any of these political boundaries,
then use the boundary plot to visualize patterns based on geographic location.
Location plot - is most useful for detecting patterns with respect to geo-
graphic point locations encoded via latitude and longitude. If your dataset
contains location information, such as an address, but does not include
latitude and longitude, you can add latitude and longitude using external
geocoding tools or join your dataset with datasets containing these values.
Model Building - Algorithm Application
To create a model using one of the available data mining algorithms, drag the
modeler (data mining algorithm) over the target dataset and drop. Before doing
this, however, be sure that the dataset is ready for processing. The modeler will
use all observations and all attributes contained in the dataset. If you don't want
to use all of the data, first create a subset of the data, eliminating any
unnecessary or unwanted attributes and observations.
Choose a modeler based on the objectives of your data mining and the
capabilities of the modelers. The features of the available modelers are summa-
rized in Table A.1. They are divided into three categories: cluster analysis,
classification (prediction of nominal value), and regression (prediction of numeric
value). Cluster analysis is oriented more toward dataset preparation (sub-popu-
lation extraction) than a data mining end point.When conducting classification or
regression modeling, it is a good idea to apply multiple modelers to compare the
performance results of each. No single modeler works best across all datasets.
Model Evaluation
Once generated, data mining models should be studied and evaluated from two
perspectives:
How well does the model performs with respect to training, validation, and
test datasets?
What is the nature of the relationships between inputs and the output
variable?
The evaluation approach employed varies with respect to the data mining
objective (classification, regression, or cluster analysis) and the algorithm used
to build the model.
Search WWH ::




Custom Search