Data Mining Process - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

and assessed which specific results should be deployed into business

operations. We will also have identified the hardware and software

resources for performing data mining in production.

It is at this point that having an API is critical for operationalizing

data mining. If a graphical user interface was used for the explor-

atory mining, you either need to have kept very good notes about the

steps and settings used to obtain your results, or be able to generate a

script of the steps performed. In some cases, this script is a combina-

tion of structured and free form text. Ideally, the tool should generate

an executable script in an appropriate programming language. This

is where JDM re-enters the picture. The generated program can then

be incorporated in the business workflow.

3.5.3

Business Workflow

At its simplest, a workflow is a sequence of tasks that are executed

when certain conditions are satisfied. For example, a workflow may

involve human actions: obtain customer information, call customer,

present offer, record response, if response positive then take order,

repeat. A workflow may also be completely automated, involving no

human interaction: receive Internet purchase order, check inventory

database if order can be fulfilled, if yes then submit shipping order.

Workflows, especially involving data mining, introduce additional

dependencies among tasks as well as a combination of automated

and manual tasks. Such dependencies may be based on time, success

or failure of previous tasks, or explicit human approval. Time depen-

dencies include starting a task at a specific time or after a specific

duration. Workflows may also be set up to be performed repetitively

(e.g., once per week).

Consider the data mining workflow illustrated in Figure 3-15,

which supports refreshing a predictive data mining model—used

for cross-sell or response modeling—on a monthly basis using the

latest data from the data warehouse. On the first of each month at

midnight, the system retrieves needed data for mining. It then pre-

pares that data using the transformations determined from the

exploratory phases of the data mining process. To increase the

chances of getting a model with adequate accuracy, we build three

models in parallel using different settings, perhaps even different

algorithms: decision tree, neural networks, and support vector

machine. These models are tested on the test data and the results

automatically compared to select the most accurate model.

Search WWH ::

Custom Search

Home