Preview of Java Data Mining 2.0 - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

and anomaly detection. Further, the expert group strives to round out

the existing functionality by including capabilities to apply associa-

tion rules to generate cross-sell or up-sell recommendations using the

apply task, to support multi-target models, and to include unstruc-

tured text as predictor attributes. Because the data mining process

also involves data preparation and the ability to associate transforma-

tions with data mining models, the expert group addresses the ability

to specify and perform transformations in a framework that fits with

the overall JDM approach.

This chapter looks at some of the features proposed for JDM 2.0,

including transformations, time series, apply for association, feature

extraction, statistics, multi-target models, and text mining.

18.1

Transformations

A major part of the data preparation phase involves data transforma-

tions . Although not limited to the realm of data mining, transforma-

tions are often an essential part of the data mining process [Pyle

1999]. JDM 2.0 introduces a framework and representative set of

transformations that provide a more seamless relationship between

transformations and data mining models.

Traditionally, transformations would be modeled via a graphical

user interface (GUI), or would be explicitly programmed (e.g., via

SQL where the data is stored in a relational database). It was the

application or user's responsibility to ensure that, if the model was

exported to another environment, the transformations came along,

since without these transformations the model is effectively useless.

Recall from the discussion in Chapter 3 that the apply data needs to

be prepared in the same way, using the same statistics, as it was for

the build data.

JDM 2.0 integrates transformations at two levels: first at the task

level, and second at the model level. At the task level, transforma-

tions are specified as a sequence of settings, where each settings

object corresponds to a type of transformation. For example, there are

numerical binning transformations, normalization transformations,

and a sample transformation. A transformation task allows users to

execute a transformation sequence to (1) compute transformation

statistics, (2) produce a reusable transformation sequence object,

and/or (3) apply a transformation sequence to data. An example of

transformation statistics involves the shift and scale from a normal-

ization transformation, which can be based on the maximum and

Search WWH ::

Custom Search

Home