and anomaly detection. Further, the expert group strives to round out
the existing functionality by including capabilities to apply associa-
tion rules to generate cross-sell or up-sell recommendations using the
apply task, to support multi-target models, and to include unstruc-
tured text as predictor attributes. Because the data mining process
also involves data preparation and the ability to associate transforma-
tions with data mining models, the expert group addresses the ability
to specify and perform transformations in a framework that fits with
the overall JDM approach.
This chapter looks at some of the features proposed for JDM 2.0,
including transformations, time series, apply for association, feature
extraction, statistics, multi-target models, and text mining.
A major part of the data preparation phase involves data transforma-
tions . Although not limited to the realm of data mining, transforma-
tions are often an essential part of the data mining process [Pyle
1999]. JDM 2.0 introduces a framework and representative set of
transformations that provide a more seamless relationship between
transformations and data mining models.
Traditionally, transformations would be modeled via a graphical
user interface (GUI), or would be explicitly programmed (e.g., via
SQL where the data is stored in a relational database). It was the
application or user's responsibility to ensure that, if the model was
exported to another environment, the transformations came along,
since without these transformations the model is effectively useless.
Recall from the discussion in Chapter 3 that the apply data needs to
be prepared in the same way, using the same statistics, as it was for
the build data.
JDM 2.0 integrates transformations at two levels: first at the task
level, and second at the model level. At the task level, transforma-
tions are specified as a sequence of settings, where each settings
object corresponds to a type of transformation. For example, there are
numerical binning transformations, normalization transformations,
and a sample transformation. A transformation task allows users to
execute a transformation sequence to (1) compute transformation
statistics, (2) produce a reusable transformation sequence object,
and/or (3) apply a transformation sequence to data. An example of
transformation statistics involves the shift and scale from a normal-
ization transformation, which can be based on the maximum and