Classification: Basic Concepts - Data Mining: Concepts and Techniques - page 333

Databases Reference

In-Depth Information

Algorithm: Generate decision tree. Generate a decision tree from the training tuples of

data partition, D .

Input:

Data partition, D , which is a set of training tuples and their associated class labels;

attribute list , the set of candidate attributes;

Attribute selection method , a procedure to determine the splitting criterion that “best”

partitions the data tuples into individual classes. This criterion consists of a

splitting attribute and, possibly, either a split-point or splitting subset .

Output: A decision tree.

Method:

(1) create a node N ;

(2) if tuples in D are all of the same class, C , then

(3) return N as a leaf node labeled with the class C ;

(4) if attribute list is empty then

(5) return N as a leaf node labeled with the majority class in D ; // majority voting

(6) apply Attribute selection method ( D , attribute list ) to find the “best” splitting criterion ;

(7) label node N with splitting criterion ;

(8) if splitting attribute is discrete-valued and

multiway splits allowed then // not restricted to binary trees

(9) attribute list attribute list splitting attribute ; // remove splitting attribute

(10) for each outcome j of splitting criterion

// partition the tuples and grow subtrees for each partition

(11) let D j be the set of data tuples in D satisfying outcome j ; // a partition

(12) if D j is empty then

(13) attach a leaf labeled with the majority class in D to node N ;

(14) else attach the node returned by Generate decision tree ( D j , attribute list ) to node N ;

endfor

(15) return N ;

Figure 8.3 Basic algorithm for inducing a decision tree from training tuples.

If the tuples in D are all of the same class, then node N becomes a leaf and is labeled

with that class (steps 2 and 3). Note that steps 4 and 5 are terminating conditions. All

terminating conditions are explained at the end of the algorithm.

Otherwise, the algorithm calls Attribute selection method to determine the splitting

criterion . The splitting criterion tells us which attribute to test at node N by deter-

mining the “best” way to separate or partition the tuples in D into individual classes

(step 6). The splitting criterion also tells us which branches to grow from node N

with respect to the outcomes of the chosen test. More specifically, the splitting cri-

terion indicates the splitting attribute and may also indicate either a split-point or

a splitting subset . The splitting criterion is determined so that, ideally, the resulting

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home