Databases Reference
In-Depth Information
Algorithm: Generate decision tree.
Generate a decision tree from the training tuples of
data partition,
D
.
Input:
Data partition,
D
, which is a set of training tuples and their associated class labels;
attribute list
, the set of candidate attributes;
Attribute selection method
, a procedure to determine the splitting criterion that “best”
partitions the data tuples into individual classes. This criterion consists of a
splitting attribute
and, possibly, either a
split-point
or
splitting subset
.
Output:
A decision tree.
Method:
(1) create a node
N
;
(2)
if
tuples in
D
are all of the same class,
C
,
then
(3) return
N
as a leaf node labeled with the class
C
;
(4)
if
attribute list
is empty
then
(5) return
N
as a leaf node labeled with the majority class in
D
; // majority voting
(6) apply
Attribute selection method
(
D
,
attribute list
) to
find
the “best”
splitting criterion
;
(7) label node
N
with
splitting criterion
;
(8)
if
splitting attribute
is discrete-valued
and
multiway splits allowed
then
// not restricted to binary trees
(9)
attribute list
attribute list
splitting attribute
; // remove
splitting attribute
(10)
for each
outcome
j
of
splitting criterion
// partition the tuples and grow subtrees for each partition
(11) let
D
j
be the set of data tuples in
D
satisfying outcome
j
; // a partition
(12)
if
D
j
is empty
then
(13) attach a leaf labeled with the majority class in
D
to node
N
;
(14)
else
attach the node returned by
Generate decision tree
(
D
j
,
attribute list
) to node
N
;
endfor
(15) return
N
;
Figure 8.3
Basic algorithm for inducing a decision tree from training tuples.
If the tuples in
D
are all of the same class, then node
N
becomes a leaf and is labeled
with that class (steps 2 and 3). Note that steps 4 and 5 are terminating conditions. All
terminating conditions are explained at the end of the algorithm.
Otherwise, the algorithm calls
Attribute selection method
to determine the
splitting
criterion
. The splitting criterion tells us which attribute to test at node
N
by deter-
mining the “best” way to separate or partition the tuples in
D
into individual classes
(step 6). The splitting criterion also tells us which branches to grow from node
N
with respect to the outcomes of the chosen test. More specifically, the splitting cri-
terion indicates the
splitting attribute
and may also indicate either a
split-point
or
a
splitting subset
. The splitting criterion is determined so that, ideally, the resulting