Database Reference
In-Depth Information
Table 2.4. FS results for SPLIT6 and SPLIT3.
Selection
Cost
Dim.
Chosen
method
function
features
1rstOP
1. Rank
4
1, 32, 118, 141
SBS
q si =0 . 99487
9
32, 65, 79, 114, 119, 142, 191, 198, 199
SBS
q oi =1 . 0
8
32, 65, 86, 129, 131, 142, 191, 201
SBS
q ci
15
78, 79, 114, 115, 118, 119, 120, 121,
126, 129, 140, 141, 142, 143, 144
1rstOP
1. Rank
2
118, 141
1rstOP
1.- 2. Rank
4
118, 120, 140, 141
1rstOP
1.- 3. Rank
6
118, 119, 120, 140, 141, 144
SBS
q si =1 . 0
1
126
SBS
q oi =1 . 0
2
118, 205
SBS
q ci
15
78, 79, 114, 115, 118, 119, 120, 121,
126, 129, 140, 141, 142, 143, 144
inantly responsible for the observed split. However, it must be minded that
weaker correlations of potential interest for the data analyst are removed by
this method, which is tailored to the needs of classification. Only those vari-
ables of value for optimum separability or optimum overlap will be chosen.
The measure q oi saturated early in the selection process, i.e., the maximum
cost function value 1.0 was reached very early, which means the measure lost
capability to properly distinguish between the contribution of the remaining
variables. Correlating the achieved result with the underlying physical mean-
ing of the variables showed that only a fraction of the relevant variables were
identified (see Table 2.5). For comparison purposes, the described first-order
method (1rstOP) also has been applied, employing the first highest-ranking
variables for pairwise class separation. Some of the relevant variables were
found with a significant speed difference compared to the higher-order meth-
ods, i.e., seconds vs. several hours on a state-of-the-art PC. However, the
method identifies an irrelevant variable, too, and regretfully leaves out of
consideration numerous relevant ones.
For SPLIT3, for q si only one and for q oi only two variables were selected.
The methods both saturated early in the selection process. In both cases the
class regions are not compact and show considerable scatter. Though a lean
classification system could be devised from this result for the information
gathering and knowledge discovery this results is far from desirable. The ap-
plication of the 1rstOP delivered similar results for first-rank variables. Only
a few of the relevant variables were identified. Increasing the included rank
positions, more relevant variables were included (see Table 2.4). However, it
is di cult for the user to judge, which parameter value for the rank position
should be set to include all relevant variables and avoid irrelevant ones. Also,
redundant variables could still be present in the selection. Due to its speed,
 
Search WWH ::




Custom Search