minimum value range of the attribute. The transformation settings
may be quite simple, for example, the percentage of cases to include
in a random sample. Other transformation settings may be more
involved, for example, providing a default normalization strategy for
all input attributes while overriding several of these attributes with
different normalization parameters.
At the model level, a model build task can be augmented with a
transformation sequence object so that the transformations specified
are applied to the data and the resulting transformation sequence is
associated with the model. This allows the model to reference the
transformations and when a model is exported, the transformations
are also exported with the model. Similarly, when the model is
applied to new data, the transformations associated with the model
can be automatically applied to the apply data.
Transformations can be divided into several categories:
• case filtering: transformations that affect the number of
cases (or rows) in the dataset, either by eliminating cases, or
by dividing data into separate datasets as in sampling, or
splitting data for building and testing.
• attribute filtering: transformations that affect the set of
attributes remaining in a dataset. This can include indicating
which attributes to exclude or include in the set according to
• attribute altering: transformations that replace the values of
attributes with new values, perhaps of a different data type
(e.g., binning a numerical attribute into string-identified
bins). These transformations may require obtaining statistics
on attributes to perform the transformation.
• attribute creating: transformations that create new
attributes, leaving existing attributes intact. These new
attributes may be derived from multiple or single attributes.
• pure function-oriented: transformations that apply a
function to each value in an attribute (e.g., square root, log,
and so on).
JDM 2.0 provides interfaces for addressing each of these categories as
part of TransformationSettings subclasses: CaseFilteringTransformation-
Settings, AttributeFilteringTransformationSettings, and AttributeTransfor-
mationSettings . The AttributeTransformationSettings include missing
value and outlier treatment, normalization, binning, explosion, recoding,