Information Technology Reference
In-Depth Information
Complexity ( T rees )= O ( nbP ath Tree Log ( L Tree ) )
i = nbAT
+ O (
ComplexityTree ( Tree i ))
i =1
(4.11)
1 if no missing values
L Tree if all are missing
nbP T else
nbP ath Tree =
(4.12)
From equation 4.11, we remark that nbP ath Tree may be L Tree in the worst-case
when all the attributes are missing. Therefore, we can note that the complex-
ity O( nbP ath Tree Log v ( L Tree )) in this equation is quasilinear according to the
number of leaves in the final decision tree.
Consequently, the total complexity of classifying a new instance with missing
values becomes:
Complexity ( T rees )= O ( nbP ath Tree Log ( L Tree ) )
+ O ( m × v m log ( L T ))
This complexity is exponential in the number of missing attributes in the instance
to classify and quasilinear in the number of leaves in the final decision tree.
4.7
Conclusion and Perspectives
In this chapter, we have introduced a probabilistic approach to fill missing val-
ues in decision trees during classification. We proposed replacing an unknown
attribute with a probability distribution and taking into account the depen-
dence between attributes. We presented the results of tests performed on several
databases and we compared our results with those given by C4.5 and Lobo's ap-
proach for the same databases. We have observed that the results of our approach
are better than those obtained by OAT and C4.5. To measure the quality of our
classification results, we used an approach derived from Relief and its extensions
to calculate the distance between two instances with missing values. For each
instance in the test data, our Analysis-Instance algorithm gives the frequency
of its nearest instances from each class. We then compared the results obtained
by the Analysis-Instance algorithm with PAT and C4.5 results. We observed
that our classification results are closer to the Analysis-Instance algorithm re-
sults and better than those given by C4.5. The complexity of constructing our
attributes trees in the two approaches (PAOTs, PATs) and the complexity of
classifying a new instance with missing values using our approach were briefly
presented in this chapter. We found that the complexity of our classification al-
gorithm is exponential according to the number of missing attributes in the test
instance [10].
In the future, we aim at using a statistical test to calculate the dependence
between attributes instead of Mutual Information. We are also going to test our
 
Search WWH ::




Custom Search