Database Reference
In-Depth Information
CRISP-DM Step 5: Evaluation
All analyses of data have the potential for false positives. Even if a model doesn't yield false
positives however, the model may not find any interesting patterns in your data. This may be
because the model isn't set up well to find the patterns, you could be using the wrong technique, or
there simply may not be anything interesting in your data for the model to find. The Evaluation
phase of CRISP-DM is there specifically to help you determine how valuable your model is, and
what you might want to do with it.
Evaluation can be accomplished using a number of techniques, both mathematical and logical in
nature. This topic will examine techniques for cross-validation and testing for false positives using
RapidMiner. For some models, the power or strength indicated by certain test statistics will also be
discussed. Beyond these measures however, model evaluation must also include a human aspect.
As individuals gain experience and expertise in their field, they will have operational knowledge
which may not be measurable in a mathematical sense, but is nonetheless indispensable in
determining the value of a data mining model. This human element will also be discussed
throughout the topic. Using both data-driven and instinctive evaluation techniques to determine a
model's usefulness, we can then decide how to move on to…
CRISP-DM Step 6: Deployment
If you have successfully identified your questions, prepared data that can answer those questions,
and created a model that passes the test of being interesting and useful, then you have arrived at
the point of actually using your results . This is deployment , and it is a happy and busy time for a data
miner. Activities in this phase include setting up automating your model, meeting with consumers
of your model's outputs, integrating with existing management or operational information systems,
feeding new learning from model use back into the model to improve its accuracy and
performance, and monitoring and measuring the outcomes of model use. Be prepared for a bit of
distrust of your model at first—you may even face pushback from groups who may feel their jobs
are threatened by this new tool, or who may not trust the reliability or accuracy of the outputs. But
don't let this discourage you! Remember that CBS did not trust the initial predictions of the
UNIVAC, one of the first commercial computer systems, when the network used it to predict the
eventual outcome of the 1952 presidential election on election night. With only 5% of the votes
counted, UNIVAC predicted Dwight D. Eisenhower would defeat Adlai Stevenson in a landslide;
Search WWH ::




Custom Search