Database Reference
In-Depth Information
( Similarity ) counts the common nodes among two trees. It is calculated starting from
the root and covering the tree, level by level. If two nodes coincide in the feature used
to make the split, the proposed branches or stratification and the position in the tree,
they will be counted as common nodes. When a different node is found the subtree
under that node is not taken into account. For a set of trees T set , with m trees the
Common value is calculated as the average value of all the possible pair to pair
comparisons (Equation 1):
m
<
1
2
Common
(
T
)
=
Similarity
(
T
,
T
)
(1)
set
k
l
m
(
m
1
k
,
l
=
0
k
l
From a practical point of view, Common quantifies structural stability of the
classification algorithm, whereas the error would quantify the quality of the
explanation given by the tree. Evidently an improvement in structural stability must
be supported with a reasonable error rate. Our main goal has been to increase stability
with no loss in accuracy.
As a summary of previous work we can say that the behaviour of the CTC
algorithm improves when the value of Number_Samples increases. When this value is
20 or greater, the results for CTC are better in average than results for any of the
versions of C4.5. Table 2 shows the results of the comparison of CTC (with N_S =
30), C4.5 100 , C4.5 union , and C4.5 not_resampling .
Values related to Error and Common are given (column R.Dif is always calculated
as the relative difference among the CTC results and the results of C4.5). The table
shows that in 16 (C4.5 100 ), 17 (C4.5 union ) and 9 (C4.5 not_resampling ) domains out of 20,
Table 2. Average results of Error and Common for every domain. CTC ( N_S = 30), C4.5 100
(C4.5 1 ) , C4.5 union (C4.5 u ) and C4.5 not_resampling (C4.5 n_r ) are shown.
Error
Common
CTC
C4.5 1
R.Dif
C4.5 u
R.Dif C4.5 n_r R.Dif
CTC
C4.5 1 C4.5 u C4.5 n_r
Breast-W
5.58
6.06
-7.99
6.26
-10.87
5.63
-0.99
2.94
1.67
19.47
2.38
Heart-C
-3.48
23.12
24.57
-5.88
27.94
-17.23
23.96
7.36
1.46
16.11
3.18
Hypo
0.72
0.78
-7.30
1.23
-41.13
0.71
1.31
3.97
2.63
24.34
3.39
Lymph
20.01
22.02
-9.11
24.83
-19.42
20.44
-2.09
7.95
2.10
17.71
3.23
Credit-G
28.03
28.28
-0.89
32.71
-14.29
28.50
-1.64
12.25
2.33
42.97
4.42
Segment210
12.72
13.71
-7.20
12.75
-0.26
13.61
-6.52
5.38
1.96
8.19
1.95
Iris
5.75 -19.39
4.63
6.29
-26.35
6.63
-30.14
2.80
2.06
5.87
3.20
Glass
-4.08
30.26
32.48
-6.83
30.28
-0.07
31.55
6.62
2.65
17.27
6.01
Voting
3.42
4.17
-17.87
5.47
-37.49
3.41
0.41
4.45
2.19
22.21
4.21
Hepatitis
20.70
20.68
0.11
22.03
-6.03
20.29
2.01
4.06
0.85
12.23
3.25
Soybean-L
11.18
13.53
-17.37
10.92
2.34
11.02
1.46
15.54
6.18
22.95
12.14
Sick-E
18.54
2.32
2.21
4.93
2.91
-20.22
1.96
7.73
4.75
16.74
8.13
Liver
33.94
35.90
-5.46
35.15
-3.44
35.31
-3.88
7.06
1.19
13.57
3.17
Credit-A
14.82
14.81
0.03
18.42
-19.58
14.51
2.11
6.04
2.14
26.19
3.92
Vehicle
27.82
28.30
-1.70
26.55
4.80
27.61
0.76
18.30
7.11
32.97
13.57
Breast-Y
26.78
28.35
-5.52
34.47
-22.30
25.81
3.78
2.23
0.75
34.99
1.16
Heart-H
2.35
1.69
21.38
20.89
22.45
-4.75
21.02
4.50
1.41
25.05
1.66
Segment2310
3.39
3.96
-14.49
3.20
5.74
3.24
4.46
22.54
10.20
29.31
14.84
Spam
7.31
7.73
-5.46
7.96
-8.17
7.25
0.74
16.69
4.55
27.87
9.97
Faithful
1.48
1.50
-1.61
2.42
-38.92
1.48
-0.18
10.76
6.54
52.86
8.18
Average 75% 14.98
15.81
-6.68
16.73
-14.07
15.15
-0.25
8.46
3.24
23.44
5.60
 
Search WWH ::




Custom Search