and assessed which specific results should be deployed into business
operations. We will also have identified the hardware and software
resources for performing data mining in production.
It is at this point that having an API is critical for operationalizing
data mining. If a graphical user interface was used for the explor-
atory mining, you either need to have kept very good notes about the
steps and settings used to obtain your results, or be able to generate a
script of the steps performed. In some cases, this script is a combina-
tion of structured and free form text. Ideally, the tool should generate
an executable script in an appropriate programming language. This
is where JDM re-enters the picture. The generated program can then
be incorporated in the business workflow.
At its simplest, a workflow is a sequence of tasks that are executed
when certain conditions are satisfied. For example, a workflow may
involve human actions: obtain customer information, call customer,
present offer, record response, if response positive then take order,
repeat. A workflow may also be completely automated, involving no
human interaction: receive Internet purchase order, check inventory
database if order can be fulfilled, if yes then submit shipping order.
Workflows, especially involving data mining, introduce additional
dependencies among tasks as well as a combination of automated
and manual tasks. Such dependencies may be based on time, success
or failure of previous tasks, or explicit human approval. Time depen-
dencies include starting a task at a specific time or after a specific
duration. Workflows may also be set up to be performed repetitively
(e.g., once per week).
Consider the data mining workflow illustrated in Figure 3-15,
which supports refreshing a predictive data mining model—used
for cross-sell or response modeling—on a monthly basis using the
latest data from the data warehouse. On the first of each month at
midnight, the system retrieves needed data for mining. It then pre-
pares that data using the transformations determined from the
exploratory phases of the data mining process. To increase the
chances of getting a model with adequate accuracy, we build three
models in parallel using different settings, perhaps even different
algorithms: decision tree, neural networks, and support vector
machine. These models are tested on the test data and the results
automatically compared to select the most accurate model.