Information Technology Reference
In-Depth Information
classification phase. The third type of technique replaces missing values with a
distribution of probability. For example, Quinlan's method [5] assigns probabil-
ity distributions to each node of the decision tree when learning from training
instances. The fourth type of technique focuses on the classification phase and
uses another attribute instead of the one that is unknown, in order to keep on
classifying the current case; the selected attribute is then correlated with the
unknown attribute. For example, the CART method [1], which constructs bi-
nary decision trees, consists in using a surrogate split when an unknown value
is found in the attribute originally selected. A surrogate split is a split that is
similar to the best split in the sense that it makes a similar partition of the
cases in the current node. Algorithms for constructing decision trees, such as
[1, 5], create a single best decision tree during the training phase, and this tree
is then used to classify new instances. The fifth type of technique constructs the
best classification rule instead of constructing the whole decision tree. For ex-
ample, the dynamic path generation method [15]. produces only the path (i.e.,
the rule) needed to classify the case currently under consideration, instead of
generating the whole decision tree beforehand. This method can deal with miss-
ing values in a very flexible way. Once a missing value is found to be present
in an attribute of a new instance, such an attribute is never branched on when
classifying the instance. Similarly, the lazy decision tree method [7] conceptually
constructs the best decision tree for each test instance. In practice, only a clas-
sification path needs to be constructed. Missing attribute values are naturally
handled by considering only splits on attribute values that are known in the test
instance. Training instances with unknowns filter down and are excluded only
when their value is unknown for a given test in a path. The last type of approach
uses decision trees to fill in missing values. For example, Shapiro's method [21]
constructs a decision tree for an unknown attribute by using the subset of the
original training set consisting of those instances whose value of the unknown
attribute is defined. The class is regarded as another attribute and it participates
in the construction of the decision tree for this attribute. This method is used
only in the building phase. We now present the Ordered Attribute Trees (OAT)
method [16],which also deals with missing values and which we have studied in
more detail.
4.1.2
Ordered Attribute Trees Method
Ordered Attribute Trees (OAT) is a supervised learning method to fill missing
values in categorical data. It uses decision trees as models for estimating un-
known values. This method constructs a decision tree for each attribute, using
a training subset that contains instances with known values for the attribute.
These cases in the training subset, for a target attribute, are described only by
the attributes whose relation with the class has lower strength than the strength
of the relation between the target attribute and the class. The resulting deci-
sion tree is called an attribute tree . This method uses Mutual Information [3]
 
Search WWH ::




Custom Search