Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data

Information Technology Reference

In-Depth Information

classification phase. The third type of technique replaces missing values with a

distribution of probability. For example, Quinlan's method [5] assigns probabil-

ity distributions to each node of the decision tree when learning from training

instances. The fourth type of technique focuses on the classification phase and

uses another attribute instead of the one that is unknown, in order to keep on

classifying the current case; the selected attribute is then correlated with the

unknown attribute. For example, the CART method [1], which constructs bi-

nary decision trees, consists in using a surrogate split when an unknown value

is found in the attribute originally selected. A surrogate split is a split that is

similar to the best split in the sense that it makes a similar partition of the

cases in the current node. Algorithms for constructing decision trees, such as

[1, 5], create a single best decision tree during the training phase, and this tree

is then used to classify new instances. The fifth type of technique constructs the

best classification rule instead of constructing the whole decision tree. For ex-

ample, the dynamic path generation method [15]. produces only the path (i.e.,

the rule) needed to classify the case currently under consideration, instead of

generating the whole decision tree beforehand. This method can deal with miss-

ing values in a very flexible way. Once a missing value is found to be present

in an attribute of a new instance, such an attribute is never branched on when

classifying the instance. Similarly, the lazy decision tree method [7] conceptually

constructs the best decision tree for each test instance. In practice, only a clas-

sification path needs to be constructed. Missing attribute values are naturally

handled by considering only splits on attribute values that are known in the test

instance. Training instances with unknowns filter down and are excluded only

when their value is unknown for a given test in a path. The last type of approach

uses decision trees to fill in missing values. For example, Shapiro's method [21]

constructs a decision tree for an unknown attribute by using the subset of the

original training set consisting of those instances whose value of the unknown

attribute is defined. The class is regarded as another attribute and it participates

in the construction of the decision tree for this attribute. This method is used

only in the building phase. We now present the Ordered Attribute Trees (OAT)

method [16],which also deals with missing values and which we have studied in

more detail.

4.1.2

Ordered Attribute Trees Method

Ordered Attribute Trees (OAT) is a supervised learning method to fill missing

values in categorical data. It uses decision trees as models for estimating un-

known values. This method constructs a decision tree for each attribute, using

a training subset that contains instances with known values for the attribute.

These cases in the training subset, for a target attribute, are described only by

the attributes whose relation with the class has lower strength than the strength

of the relation between the target attribute and the class. The resulting deci-

sion tree is called an attribute tree . This method uses Mutual Information [3]

Mining Complex Data

Search WWH ::

Custom Search

Home