Database Reference
In-Depth Information
The different subsamples are used to make proposals about the feature that should
be used to split in the current node. The split function used in this work is the gain
ratio criterion (the same used by Quinlan in C4.5). The decision about which feature
will be used to make the split in a node of the Consolidated Tree (CT) is accorded
among the different proposals. The decision is made by a voting process node by
node. Based on this decision, all the subsamples are divided using the same feature.
The iterative process is described in Algorithm 1.
The algorithm starts extracting a set of subsamples ( Number_Samples ) from the
original training set. The subsamples can be obtained based on the desired resampling
technique ( Resampling_Mode ).
Decision tree's construction algorithms, usually divide the initial sample in several
data partitions. In our algorithm, LS i contains the data partitions created from each
subsample S i .
Algorithm 1. Consolidated Trees' Construction Algorithm (CTC)
Generate Number_Samples subsamples ( S i ) from S with Resampling_Mode method.
CurrentNode := RootNode
for i := 1 to Number_Samples
LS i := { S i }
end for
repeat
for i := 1 to Number_Samples
CurrentS i := First ( LS i )
LS i := LS i - CurrentS i
Induce the best split (X,B) i for CurrentS i
end for
Obtain the consolidated pair ( X c ,B c ), based on (X,B) i , 1 ≤ i Number_Samples
if ( X c ,B c ) ≠ Not_Split
Split CurrentNode based on ( X c ,B c )
for i := 1 to Number_Samples
Divide CurrentS i based on ( X c ,B c ) to obtain n subsamples { S 1 i , … S n i }
LS i := { S 1 i , … S n i } + LS i
end for
else consolidate CurrentNode as a leaf
end if
Decision in
each subsample
Force in all
subsamples
(Consolidate)
CurrentNode := NextNode
until
i, LS i is empty
In the algorithm, the consolidation of a node is divided in two main parts. The first
one where a separate analysis is done in each of the subsamples, and the second one,
where based on all the proposals, a decision to consolidate the node is made. At this
point, all the subsamples are forced to make the same split.
The pair (X,B) i is the split proposal for the first partition in LS i . X is the feature
selected to split and B indicates the proposed branches or criteria to divide the data in
the current node. X c is the feature obtained by a voting process among all the
Search WWH ::




Custom Search