Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data

Information Technology Reference

In-Depth Information

as a measure of the strength of relations between the attributes and the class 2 .

There is an order for the construction of the attribute trees. This order is guided

by the Mutual Information between the attributes and the class. The method

orders the attributes from those with low mutual information to those with high

mutual information. It constructs attribute trees according to this order. These

trees are used to determine unknown values for each attribute. The first at-

tribute tree constructed is a one-node tree with the most frequent value among

the values of the attribute. An attribute tree is constructed for an attribute

A i using a training subset, which contains instances with known values for the

attribute A i , and the attributes whose missing values have already been filled

before. Consequently, the attributes A k for which MI(A i ,C)< MI(A k ,C) are

excluded [16]. During the calculation of MI(A i ,C) , instances which have missing

values for the attribute A i are ignored [18]. This method is not general enough

to be applicable to every domain [18]. The domains in which there are strong

relations between the attributes appear to be the most suitable to apply the

OAT method. In this method, the idea to start by dealing with the attribute

which is the less dependent on the class [16, 17, 18] is interesting, because it is

the attribute which has the least influence on the class.

4.1.3

C4.5's Method

Quinlan's method [5] assigns probability distributions to each node of the deci-

sion tree when learning from training instances. The probability distribution, for

the values of the attribute involved in the test in a node, is estimated from the

relative frequencies of the attribute values among the training instances collected

at that node. The result of the classification is a class distribution instead of a

single class. This approach works well when most of the attributes are indepen-

dent, because it depends only on the prior distribution of the attribute values

for each attribute being tested in a node of the tree [5, 23].

4.1.4

Conclusion

We observe that the methods above have some drawbacks. For example, [5,

13, 22] determine the missing attribute values only once for each object with

this unknown attribute. The Dynamic path generation method and the lazy

2 Mutual Information (MI) between two categorical random variables X and Y is the

average reduction in uncertainty about X that results from learning the value of Y:

P ( x ) log 2 P ( x )+

y

P ( y )

x

MI ( X, Y )= −

P ( x|y ) log 2 P ( x|y )

x

∈

D x

∈

D y

∈

D x

D x and D y are the domains of the categorical random variables X and Y. P(x) and

P(y) are the probability of occurrence of x ∈ D x and y ∈ D y ,respectively. P(x|y)

is the conditional probability of X having the value x once Y is known to have the

value y.

Mining Complex Data

Search WWH ::

Custom Search

Home