Introduction to Data Mining and CRISP-DM - Data Mining for the Masses

Database Reference

In-Depth Information

CRISP-DM Step 5: Evaluation

All analyses of data have the potential for false positives. Even if a model doesn't yield false

positives however, the model may not find any interesting patterns in your data. This may be

because the model isn't set up well to find the patterns, you could be using the wrong technique, or

there simply may not be anything interesting in your data for the model to find. The Evaluation

phase of CRISP-DM is there specifically to help you determine how valuable your model is, and

what you might want to do with it.

Evaluation can be accomplished using a number of techniques, both mathematical and logical in

nature. This topic will examine techniques for cross-validation and testing for false positives using

RapidMiner. For some models, the power or strength indicated by certain test statistics will also be

discussed. Beyond these measures however, model evaluation must also include a human aspect.

As individuals gain experience and expertise in their field, they will have operational knowledge

which may not be measurable in a mathematical sense, but is nonetheless indispensable in

determining the value of a data mining model. This human element will also be discussed

throughout the topic. Using both data-driven and instinctive evaluation techniques to determine a

model's usefulness, we can then decide how to move on to…

CRISP-DM Step 6: Deployment

If you have successfully identified your questions, prepared data that can answer those questions,

and created a model that passes the test of being interesting and useful, then you have arrived at

the point of actually using your results . This is deployment , and it is a happy and busy time for a data

miner. Activities in this phase include setting up automating your model, meeting with consumers

of your model's outputs, integrating with existing management or operational information systems,

feeding new learning from model use back into the model to improve its accuracy and

performance, and monitoring and measuring the outcomes of model use. Be prepared for a bit of

distrust of your model at first—you may even face pushback from groups who may feel their jobs

are threatened by this new tool, or who may not trust the reliability or accuracy of the outputs. But

don't let this discourage you! Remember that CBS did not trust the initial predictions of the

UNIVAC, one of the first commercial computer systems, when the network used it to predict the

eventual outcome of the 1952 presidential election on election night. With only 5% of the votes

counted, UNIVAC predicted Dwight D. Eisenhower would defeat Adlai Stevenson in a landslide;

Search WWH ::

Custom Search

Home