Database Reference
In-Depth Information
decision tree model. Similarly, a hierarchical or agglomerative cluster algorithm
will fail to analyzemore than a few thousand records when some of themost recently
developed clustering algorithms, like IBM SPSS Modeler TwoStep Model, can
handle millions without sampling. Within the machine learning algorithms we
can also note substantial differences in terms of speed and required resources,
with neural networks, including SOMs for clustering, among the most demanding
techniques.
Another advantage of machine learning algorithms is that they have less
stringent data assumptions. Thus they are more friendly and simple to use for
those with little experience in the technical aspects of model building. Usually,
statistical algorithms require considerable effort in building. Analysts should spend
time taking into account the data considerations. Merely feeding raw data into
these algorithms will probably yield poor results. Their building may require special
data processing and transformations before they produce results comparable or
even superior to those of machine learning algorithms.
Another aspect that data miners should take into account when choosing a
model technique is the insight provided by each algorithm. In general, statistical
models yield transparent solutions. On the contrary, somemachine learningmodels,
like neural networks, are opaque, conveying little information and knowledge about
the underlying data patterns and customer behaviors. They may provide reliable
customer scores and achieve satisfactory predictive performance, but they provide
little or no reasoning for their predictions. However, among machine learning
algorithms there are models that provide an explanation of the derived results,
like decision trees. Their results are presented in an intuitive and self-explanatory
format, allowing an understanding of the findings. Since most data mining software
packages allow for fast and easy model development, the case of developing one
model for insight and a different model for scoring and deployment is not unusual.
SUMMARY
In the previous sections we presented a brief introduction to the main concepts of
data mining modeling techniques. Models can be grouped into two main classes:
supervised and unsupervised.
Supervised modeling techniques are also referred to as directed or predictive
because their goal is prediction. Models automatically detect or ''learn'' the input
data patterns associated with specific output values. Supervised models are further
grouped into classification and estimation models, according to the measurement
level of the target field. Classification models deal with the prediction of categorical
outcomes. Their goal is to classify new cases into predefined classes. Classification
Search WWH ::




Custom Search