Databases Reference
In-Depth Information
Algorithm: Generate decision tree. Generate a decision tree from the training tuples of
data partition, D .
Input:
Data partition, D , which is a set of training tuples and their associated class labels;
attribute list , the set of candidate attributes;
Attribute selection method , a procedure to determine the splitting criterion that “best”
partitions the data tuples into individual classes. This criterion consists of a
splitting attribute and, possibly, either a split-point or splitting subset .
Output: A decision tree.
Method:
(1) create a node N ;
(2) if tuples in D are all of the same class, C , then
(3) return N as a leaf node labeled with the class C ;
(4) if attribute list is empty then
(5) return N as a leaf node labeled with the majority class in D ; // majority voting
(6) apply Attribute selection method ( D , attribute list ) to find the “best” splitting criterion ;
(7) label node N with splitting criterion ;
(8) if splitting attribute is discrete-valued and
multiway splits allowed then // not restricted to binary trees
(9) attribute list attribute list splitting attribute ; // remove splitting attribute
(10) for each outcome j of splitting criterion
// partition the tuples and grow subtrees for each partition
(11) let D j be the set of data tuples in D satisfying outcome j ; // a partition
(12) if D j is empty then
(13) attach a leaf labeled with the majority class in D to node N ;
(14) else attach the node returned by Generate decision tree ( D j , attribute list ) to node N ;
endfor
(15) return N ;
Figure 8.3 Basic algorithm for inducing a decision tree from training tuples.
If the tuples in D are all of the same class, then node N becomes a leaf and is labeled
with that class (steps 2 and 3). Note that steps 4 and 5 are terminating conditions. All
terminating conditions are explained at the end of the algorithm.
Otherwise, the algorithm calls Attribute selection method to determine the splitting
criterion . The splitting criterion tells us which attribute to test at node N by deter-
mining the “best” way to separate or partition the tuples in D into individual classes
(step 6). The splitting criterion also tells us which branches to grow from node N
with respect to the outcomes of the chosen test. More specifically, the splitting cri-
terion indicates the splitting attribute and may also indicate either a split-point or
a splitting subset . The splitting criterion is determined so that, ideally, the resulting
 
Search WWH ::




Custom Search