Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data

Information Technology Reference

In-Depth Information

following the branches according to attributes values of the instance. When we

encounter a missing value for a test-attribute (test-node), we must trace all the

paths corresponding to the values of this attribute. In this case, we reach several

leaves in the tree, and not only one leaf as in classical classification. For this

purpose, it is necessary to calculate the class probability on each one of these

leaves.

Let us assume that the class has two values A , D , and for a path from the

root of the tree to a leaf F , we go through the branches B 1 , B 2 ,.., B n .

P(class A at leaf F) = P(A | path from the root to F) = P ( A|B 1 ,B 2 , .., B n )

P(class D at leaf F) = P(D

path from the root to F) = P ( D

B 1 ,B 2 , .., Bn )

P ( A in the tree )=

P ( A

F i )

∗

P ( F i )

P ( D in the tree )=

P ( D

F i )

∗

P ( F i )

where i = 1,..,m (m is the number of leaves in the tree).

The probability P ( A

F i ) is the conditional probability of class A at this leaf;

the probability P ( F i ) is the joint probability of the attributes in the path which

starts from the root until the leaf F i .

To simplify, let us consider that the path from the root of the tree until F i

goes through only the branches B 1 and B 2 :

P ( F i )= P ( B 1 ,B 2 )= P ( B 1 )

∗

P ( B 2 |

B 1 ); B 1 is less dependent on the class

than B 2 4 .

4.3.1

Calculating the Joint Probability P ( B 1 ,B 2 )UsingOur

Approach

To calculate this joint probability, we distinguish the following cases:

•

B 1 and B 2 are independent:

P ( B 2 |

P ( B 2 )

Consequently, the PAT of B 1 is constructed without B 2 and the PAT of B 2

is constructed without B 1 . We calculate the probability of the attribute B 1

from its PAT . The probability of B 2 is also calculated from its PAT .

B 1 )= P ( B 2 )and P ( B 1 ,B 2 )= P ( B 1 )

∗

•

B 1 and B 2 are dependent and the POAT of B 1 is constructed without B 2

because B 1 is less dependent on the class than B 2 : P ( B 1 |

= P ( B 1 ). The

probability of B 1 is calculated from its POAT . Note that the PAT of B 2 is

constructed using B 1 . Therefore, we calculate the conditional probability of

B 2 given B 1 P ( B 2 |

B 2 )

B 1 )fromthe PAT of B 2 .

4 In our work, when two attributes are dependent and unknown at the same time

(Cycle problem) , we deal first with the attribute which is less dependent on the class

by using its POAT . Then, for the other attribute, we use its PAT .

Mining Complex Data

Search WWH ::

Custom Search

Home