Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data - page 66

Information Technology Reference

In-Depth Information

Table 4.13. Result of testing Instance Analysis Algorithm on Vote database

attributes

instance 1

instance 2

instance 3

physician-fee

?

?

?

el-salvador

y

?

y

education

y

n

n

crime

y

n

n

near=8

(16%, 83%)(91%, 08%)(92%, 07%)

near=10

(29%, 70%) (84%, 15%) (75%, 24%)

near=12

(38%, 61%) (70%, 29%) (57%, 42%)

PAT

(11%,89%) (99%,01%) (85%,15%)

C4.5

(53%,47%)

(53%,47%)

(53%,47%)

has 16 attributes, so the constant near may be 8, 9, 10, 11, or 12. Table 4.13

contains the results of testing this algorithm only when near is 8, 10 and 12.

It also contains the results of testing the PAT approach and C4.5 on the same

examples. By comparing the results of PAT and C4.5 with the statistical results

given by the Analysis-Instance algorithm for the same examples, we remark

that PAT's results in Table 4.13 are closer to the statistical results in Table 4.13

when near is 8. Therefore, they are better than the C4.5 results when near is 8,

10 or 12.

We note that the only attribute used to construct the decision tree using

the C4.5 method is physician-fee-freeze , which has the greatest influence on the

decision. However, if this attribute is unknown, as in Table 4.13, C4.5 calculates

its frequency in all the training data without taking into account the other

attributes which depend on it. Consequently, in the test data each object which

has a missing value for physician-fee-freeze is classified Democrat with probability

0.53 and Republican with probability 0.47 (Table 4.13). For the threshold 0.4,

our probabilistic decision tree corresponding to the training data is constructed

using the attributes physician-fee-freeze , el-salvador-aid and education-spending .

The PAT for physician-fee-freeze is constructed using el-salvador-aid , education-

spending and crime . Consequently, when physician-fee-freeze is unknown, we

calculate its probability according to its dependent attributes, and so on. For

example, in Table 4.13, we notice that in the PAT approach, the probability

distribution of each object depends on the other attributes values. However,

with C4.5, this distribution depends only on physician-fee-freeze 's frequency.

4.6

The Complexity of Constructing a Decision Tree

The complexity of a decision tree depend on the size of the training data and

the number of attributes in this data. For example, if we have training data

that contains n objects and m attributes without the class, the computational

complexity of the decision tree constructed using this training data is [19]:

O ( n × ( m +1) × log ( n ))

(4.6)

Next Page

Mining Complex Data

Search WWH ::

Custom Search

Home