and general expressions. A key goal for JDM 2.0 transformations is to
allow the transformation of very wide (in terms of number of
attributes) datasets with minimal specification. For example, if a
genomics dataset containing microarray data on 5,000 genes needs to
have attribute values normalized between 0 and 1, the program
should not have to specify the same normalization on each of the
5,000 attributes. Instead, JDM 2.0 allows users to specify a default
normalization of all applicable attributes in the input dataset.
Consider the example illustrated in Figure 18-1, which shows a
TransformationSettingsSequence instance consisting of three transfor-
mations: first take a 20 percent sample of the data, then remove col-
umns that are determined to be “constants,” (i.e., having the same
value for 95 percent of the entries), and lastly, bin the income attribute
into two bins.
We can now define a task, which consists of a PhysicalDataSet and
TransformationSettingsSequence instance, to perform the transforma-
tions and produce a TransformationSequence object, which is depicted
in Figure 18-2.
Take 20% Sample
of the Data
Bin Column 'Income'
into Two Bins
Transformations settings sequence example.