Java Reference
In-Depth Information
Recall that in Listing 9-4 we use the simple data specifications,
where the data location is specified and all vendor defaults for the
data specification are accepted. However, in practice data specifica-
tions involve the specification of attribute roles, attribute types, data
preparation requirements, and attribute usage types that we dis-
cussed in Section 7.1.
Listing 9-7 extends the code in Listing 9-4 with additional data
specifications such as the physical data that specifies the case id
role for the cust_id attribute, logical data that specifies the valid
values, and data preparation status for marital status . The init and
input methods in Listing 9-7 create PhysicalAttribute , LogicalData ,
CategorySet , and LogicalAttribute objects to specify these additional
settings. Lines 10 to 17 create the object factories associated with
these objects. Lines 22 and 23 create the physical attribute object
for the cust_id attribute, which has the integer data type and a role
of case id to uniquely identify each customer case. Line 26 creates
the LogicalData object and line 27 creates the LogicalAttribute object
that specifies the name as Marital Status and attribute type as
categorical .
The DME uses the implementation-specific defaults for the
attributes that don't have an associated logical attribute. In this sim-
ple example, only the marital status attribute has a logical specifica-
tion; however, in practice more or all attributes may have logical
specifications. Lines 30 to 35 show the category set creation that spec-
ifies the valid values of the marital status attribute: married, single,
divorced, and widowed. A category set is an optional logical specifi-
cation for categorical attributes to inform the model build operation
of valid, missing, and invalid category values. When there is no cate-
gory set specification, algorithms use the JDM implementation
defaults to identify missing values. For example, null values are typi-
cally interpreted as missing. Unless otherwise specified, all values of
an attribute are typically considered valid.
In this example, the logical data object is saved with name
attrition_logical_data and is associated with the build settings object
using the setLogicalDataName method as shown in line 42. In the
build task, the physical dataset attributes can be explicitly mapped to
the logical data attributes as shown in lines 49 to 51. This mapping
allows the build operation to know which logical attribute specifica-
tions are associated with which physical attribute. If not explicitly
mapped, attribute name equivalence is used to associate physical
attributes to logical attributes.
Search WWH ::




Custom Search