Database Reference
In-Depth Information
proposed X. Whereas B c is the median of the proposed Cut values when X c is
continuous and all the possible values of the feature when X c is discrete.
When a node is consolidated as a leaf node, the a posteriori probabilities associated
to it are calculated by averaging the a posteriori obtained from the data partitions
related to that node in all the subsamples.
The used resampling technique and the number of subsamples used in the tree's
building process are important aspects of the algorithm [16]. There are many possible
combinations for the Resampling_Mode : size of the subsamples
100%, 75%, 50%,
etc; of the original training set
, with replacement or without replacement, stratified
or not, etc. In previous works stratified subsamples of 75% and 50% of the original
training set and bootstrap samples have been used for experimentation and we have
observed that CTC algorithm behaves in a similar manner for all of them. We present
in this paper results for stratified subsamples of 75% because the quality of the
achieved results is slightly better than it is with other combinations.
Once the consolidated tree has been built, it works the same way a decision tree
does.
Consolidated
Tree's
Progress
•••
C4.5
sex:{m,f}
C4.5
C4.5
age: cut =33
sex:{m,f}
•••
•••
sex C : {m,f}
sex C :{m,f}
Decision
s e x C :{m,f}
sex
m
m
f
f
f
f
m
m
•••
C4.5
C4.5
C4.5
age: cut =28
age: cut =30
color:{r,g,b}
•••
•••
sex
f
m
a g e C :cut C
a ge C :cut C
age C :cut C
30
Decision
age
≤30 >30
f
≤30 >30
>30
f
f
≤30
≤30
>30
•••
sex
•••
m
f
C4.5
C4.5
C4.5
color:{r,g,b}
not-split
not-split
age
•••
Decision
≤30
>30
not-split
not-split
not-split
•••
Finish
Finish
Finish
Fig. 1. Example of a Consolidated Tree's (CT) building process based on C4.5 (gain ratio)
We present an example of how a CT tree is built in Fig.1. In the first step, “sex”
( X ) variable with branches “m” and “f” ( B ) is proposed by two of the samples and
“age” ( X ) with cut value “33” ( B ) by another one. Whereas in the second step, the
proposed variables are “age” for two of the samples and “color” for a third one. If the
Search WWH ::




Custom Search