Java Reference
In-Depth Information
mining process (e.g., specific data preparation and modeling steps)
has been clearly defined and is largely amenable to automation.
Where human resources are involved in the production of data min-
ing models, standard human resource project scheduling techniques
must be applied.
The business decisions for scheduling and workflows revolve
around when the right data is available and when the information or
knowledge, or mining results, are required. For example, if a busi-
ness goal is to conduct a campaign leading up to the Labor Day holi-
day weekend, there is a hard deadline of when the mining results
need to be available, perhaps several weeks prior so that appropriate
printing activities can occur. If the data used for building the model
or models will not change for the month prior to the needed results,
preparing data and scoring customers for likelihood to respond can
be performed more flexibly. If the data preparation and model build
number-crunching will take 3 hours to execute, and the scoring of the
potential respondees will take 6 hours, barring hardware, software,
or other failures, such a model could be scheduled the day before the
results are needed.
However, if hardware is scarce, or multiple activities are being per-
formed on the same hardware, it may be appropriate to schedule
model building and batch scoring activities during periods of low user
or customer activity, for example, the middle of the night. It is fairly
easy to use the Timer library in Java to implement Job Scheduling and
there are several other libraries to perform job scheduling, such as
Quartz [Quartz 2006]. JDM 2.0 introduces a basic scheduling interface
that unifies scheduling within the JDM Connection execute method
and can be implemented in terms of these specialized libraries.
Some obvious but nonetheless important dependencies exist between
data mining tasks that impact workflow design. As Figure 15-2
depicts, there may be separate workflow steps to support data prepa-
ration; these steps detail which data is brought together and the spe-
cific transformations that must be applied. If this process succeeds, a
particular model or set of models is built according to prespecified
build settings. To ensure that the resulting model(s) are valid, even if
the build terminated successfully, the workflow should test the model
to ensure a minimum level of accuracy or some other clearly defined
Search WWH ::

Custom Search