Java Reference
In-Depth Information
Original
Transformed Dataset
Sample,
Transform,
Prepare
Data
Build
Model
Data
Data ´
Model
Build
Settings
Figure 3-7
Data mining model build process.
3.3.2
Model Apply
In model apply, the objective is to use the model to make predictions
or classify data. This is often referred to as scoring . The data used is
called the apply data . When using a data mining model for apply,
the apply data should have characteristics similar to the build data
(e.g., the same or a subset of the attributes used for model build-
ing). We include “subset” here because some algorithms, like deci-
sion trees, produce models that use only the most relevant
attributes. Hence, during apply, only those attributes need be
included.
The apply data must be transformed in the same way as the build
data was transformed, using the same statistics gathered for the
transformations from the build data. Consider an attribute age with
values ranging from 10 to 90. If this attribute were binned into 8 bins,
each with a range of 10 years, this same transformation must be
applied to data used for applying the model. If we did not bin the
data, or binned it into, say, 12 bins, the model would likely produce
incorrect results, if not explicitly raise exceptions. Note that it would
not matter if the apply data contained different age ranges, say from
5 to 75; the original bin boundaries must be used.
Search WWH ::




Custom Search