Example of a data mining workflow.
metric. If the model tests succeed, the model(s) may be deployed to
their target environment, perhaps another application that is
responsible for scoring, either in batch or real-time. The deployment
may be as simple as providing the name of a data mining model
stored in a database to the remote system, or may involve exporting
the model as XML or a proprietary format and importing it at the tar-
get site. When at the target environment, the model may be applied to
new data (e.g., to produce scores, segment customers, and so on).
Figure 15-2 depicts the success path; however, each workflow
step also requires contingency tasks to handle failures. Depending
on the sophistication of the workflow design, some automated mea-
sures may be taken to correct problems. For example, if test results
do not meet specifications, the workflow could attempt to build
different models with different algorithms and settings and reenter
the test step.
There are considerations when redeploying a model in a produc-
tion environment. In a real-time environment, perhaps where cross-
sell recommendations are being made to online customers, models
must be refreshed without interrupting scores for existing customers.
In this case, if a model switch can be made atomically, 2 allowing
pending requests to complete and new requests to be directed to the
refreshed model, users should see no unusual effects such as miss-
ing recommendations. Once the old model is no longer servicing
requests, it can be retired from service, having been fully replaced by
the new model.
Note that both scheduling and workflow environments are slowly
being addressed by business process environments, with frame-
works such as BPEL [BPEL 2004] and BPML [BPML 2003]. Many of
these environments allow inclusion of external services within busi-
ness processes, and facilitate integrating services through Web ser-
vices. JDM also defines a Web services interfaces so that DME
providers that implement the Web services layer can easily be inte-
grated in such business process environments.
That is, similar to the notion of an atomic database commit operation—either
happening completely, or not at all.