Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data

Information Technology Reference

In-Depth Information

•

B 1 et B 2 are dependent, B 1 is the less dependent on the class. There is

another missing attribute G which is dependent on B 1 and B 2 , G is less

dependent on the class than B 1 and B 2 5 .

P ( B 1 )=

P ( B 1 |

G i )

∗

P ( G i )

B 1 )=

P ( B 2 |

B 1 ,G i )

∗

P ( G i |

B 1 )

P ( B 1 ,B 2 )=

P ( B 1 ,B 2 ,G i )

P ( G i )

∗

P ( B 1 |

G i )

∗

P ( B 2 |

B 1 ,G i )

(4.1)

•

B 1 and B 2 are independent but they are dependent on another missing at-

tribute G . G is less dependent on the class than B 1 and B 2 :

P ( B 1 ,B 2 )=

P ( B 1 ,B 2 ,G i )

P ( G i )

∗

P ( B 1 |

G i )

∗

P ( B 2 |

G i )

B 2 and B 1 are conditionally independent, given G .

4.4

Experiment

In our experiment, we tested our approach on several databases from the UCI

repository [20]. Each database was tested on several thresholds. To choose a

threshold, we calculate the average Normalized Mutual Information 6 calculated

between each attribute and the class. We then choose some thresholds that are

closest to this average value [9]. We compared our classification results with

those generated by Quinlan's method [5]; we found that our results are equal

or better than those given by C4.5. We present the tests performed on the vote

database [20]. A training set, which has 232 instances with 16 discrete attributes

(all are Boolean and take the values y or n ), is used to construct our trees

( POATs and PATs ). The class in this database can take two values: ( Democrat

and Republican ). This training data does not have any missing values, but the

test data we used contains 240 objects with missing values. The average value of

Normalized Mutual Information is 0.26. Therefore, we have tested our approach

5 We also can calculate the joint probability given in the equation 4.1 as fol-

lowing: P ( B 1 ,B 2 )= P ( B 1 ) ∗ P ( B 2 |B 1 )= P ( B 1 ) ∗ i P ( B 2 |B 1 ,G i ) ∗ P ( G i |B 1 )=

i P ( B 2 |B 1 ,G i )* P ( G i |B 1 ) ∗ P ( B 1 )= i P ( B 2 |B 1 ,G i ) P ( B 1 |G i ) P ( G i )

6 We use Normalized Mutual Information as proposed by Lobo and Numao [18] instead

of Mutual Information . Normalized Mutual Information is defined as:

2 MI ( X, Y )

log ||D x || +log ||D y ||

MI N ( X, Y ) ≡

Mining Complex Data

Search WWH ::

Custom Search

Home