Data Mining Process - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Original

Transformed Dataset

Sample,

Transform,

Prepare

Data

Build

Model

Data

Data ´

Model

Build

Settings

Figure 3-7

Data mining model build process.

3.3.2

Model Apply

In model apply, the objective is to use the model to make predictions

or classify data. This is often referred to as scoring . The data used is

called the apply data . When using a data mining model for apply,

the apply data should have characteristics similar to the build data

(e.g., the same or a subset of the attributes used for model build-

ing). We include “subset” here because some algorithms, like deci-

sion trees, produce models that use only the most relevant

attributes. Hence, during apply, only those attributes need be

included.

The apply data must be transformed in the same way as the build

data was transformed, using the same statistics gathered for the

transformations from the build data. Consider an attribute age with

values ranging from 10 to 90. If this attribute were binned into 8 bins,

each with a range of 10 years, this same transformation must be

applied to data used for applying the model. If we did not bin the

data, or binned it into, say, 12 bins, the model would likely produce

incorrect results, if not explicitly raise exceptions. Note that it would

not matter if the apply data contained different age ranges, say from

5 to 75; the original bin boundaries must be used.

Search WWH ::

Custom Search

Home