Information Technology Reference
In-Depth Information
Table 2.2 The cumulative results for the XOR and HYPER series of data sets
XOR HYPER
DIM TP FP FN Sensitivity (%) PPV (%) TP FP FN Sensitivity (%) PPV (%)
2 51.2 0.3 0.2 98 98 - - - - -
3 42.1 0.2 8.2 81 97 50.5 0.1 1.3 97 99.7
4 37.0 0.3 15.0 65 99 52.2 0.1 0.7 96 99.6
5 14.2 0.3 36.2 23 97 47.7 0.1 5.3 91 99.0
The average number of false positive, false negative, sensitivity and PPV were computed for the
entire range of parameters
Table 2.3 Cumulative results for four dimensional data set
Ntotal Mean TP Mean FP Mean FN Mean sensitivity (%) Mean PPV (%)
100 24.5 0.7 0.8 91.7 97.4
200 40.1 0.5 0.9 90.2 98.7
500 61.6 0.1 6.6 81.5 99.8
1,000 52.4 0.3 15.7 53.6 99.4
2,000 48.2 0.1 19.9 59.7 99.9
5,000 29.7 0.1 42.6 33.8 99.6
10,000 25.6 0.1 57.2 34.2 99.6
The average number of true and false positive, false negative, sensitivity and PPV are displayed for
varying number of total variables. The averaging was performed over variable number of combina-
tion variables
should be expected in 10 runs of Boruta algorithm. Both sensitivity and PPV are very
high for the sets in the HYPER series, hence deeper analysis is devoted to the more
difficult XOR series.
The four-dimensional data sets are examined in closer detail in the Table 2.3 ,
where the results for a range of total number of variables is presented. It is clear that
the sensitivity of the algorithm drops with increasing number of variables, in line
with the number of false positive discoveries.
The drop in sensitivity with increasing number of variables is expected behaviour.
When the number of variables is large, the chance for a variable to be included in
a tree in few first splits is diminished, hence the impact of individual variable is
a subject to larger variability when compared with systems with a small number
of variables. Therefore it is more difficult to discern relevant variables with lesser
impact from random ones. This effect can be circumvented by increasing a number
of trees in the system; see Table 2.4 , where cumulative data for all four-dimensional
sets is presented as well as a more detailed analysis of a five-dimensional set.
Another interesting effect is presented in Table 2.5 . Systems with different number
of relevant variables have variable behaviour of sensitivity when the number of
variables is increasing. For example, when the total number of variables is 500,
the sensitivity is 100% for a system with 54 relevant variables, whereas it is 87%
for a system with 204 relevant variables. When the number of random variables is
 
Search WWH ::




Custom Search