Preview of Java Data Mining 2.0 - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

and general expressions. A key goal for JDM 2.0 transformations is to

allow the transformation of very wide (in terms of number of

attributes) datasets with minimal specification. For example, if a

genomics dataset containing microarray data on 5,000 genes needs to

have attribute values normalized between 0 and 1, the program

should not have to specify the same normalization on each of the

5,000 attributes. Instead, JDM 2.0 allows users to specify a default

normalization of all applicable attributes in the input dataset.

Consider the example illustrated in Figure 18-1, which shows a

TransformationSettingsSequence instance consisting of three transfor-

mations: first take a 20 percent sample of the data, then remove col-

umns that are determined to be “constants,” (i.e., having the same

value for 95 percent of the entries), and lastly, bin the income attribute

into two bins.

We can now define a task, which consists of a PhysicalDataSet and

TransformationSettingsSequence instance, to perform the transforma-

tions and produce a TransformationSequence object, which is depicted

in Figure 18-2.

Transformation

SettingsSequence-1

Take 20% Sample

of the Data

Remove 95%

“Constant” Attributes

Bin Column 'Income'

into Two Bins

Transformation

Task-1

Transformation

SettingSequence-1

PhysicalDataset-1

Figure 18-1

Transformations settings sequence example.

Search WWH ::

Custom Search

Home