Information Technology Reference
In-Depth Information
4
Dealing with Missing Values in a Probabilistic
Decision Tree during Classification
Lamis Hawarah, Ana Simonet, and Michel Simonet
Institut d'Ingenierie et de l'Information de SantĀ“e(TIMC)
FacultĀ“edeMedecine
38700 La Tronche, France
{ lamis.hawarah,ana.simonet,michel.simonet@imag.fr }
This chapter deals with the problem of missing values in decision trees during classifi-
cation. Our approach is derived from the ordered attribute trees method, proposed by
Lobo and Numao in 2000, which builds a decision tree for each attribute and uses these
trees to fill the missing attribute values. Our method takes into account the depen-
dence between attributes by using Mutual Information. The result of the classification
process is a probability distribution instead of a single class. In this chapter, we ex-
plain our approach, we then present tests performed of our approach on several real
databases and we compare them with those given by Lobo's method and Quinlan's
method. We also measure the quality of our classification results. Finally, we calculate
the complexity of our approach and we discuss some perspectives.
4.1
Introduction
In classification, the goal of a learning algorithm is to build a classifier from a
training set. Each example in such a training set is assigned to a class. Classi-
fication is the task of assigning objects to their respective categories. Decision
Trees are one of the most popular classification algorithms currently in use in
Data Mining and Machine Learning. Decision Trees belong to supervised clas-
sification methods. Once built, decision trees are used to classify new cases. A
case is classified by starting at the root node of the tree, testing the attribute
specified by this node, then moving through the tree until a leaf is encountered;
the case is classified by the class associated with the leaf. It may happen that
some objects have no value for some attributes. We can encounter this problem,
known as the problem of missing values, both during the construction phase and
the classification phase of a decision tree. In the latter situation, when classi-
fying an object, if the value of a particular attribute which was branched on
in the tree is missing in the object, it is not possible to decide which branch
to take in order to classify this object, and the classification process cannot be
completed.
Our objective is to classify an object with missing values. Our work is situ-
ated in the framework of probabilistic decision trees [1, 5, 23]. We aim at using
 
Search WWH ::




Custom Search