Database Reference
In-Depth Information
Table 2.2. Rank order for first-order feature selection computed for visual in-
spection feature data based on parametric (left column) and nonparametric (right
column) assessment measure.
Feature
R
C 1-2
R
C 1-2
x 1
5
0,2917
5
0,6977
x 2
4
0,6489
3
0,8211
x 3
1
1,1558
4
0,7058
x 4
3
0,8547
2
0,8808
x 5
2
1,0047
1
0,9270
measure q o (or q oi ) separately for each individual feature. This returns a cor-
responding nonparametric measure to the parametric one given earlier. For
Iris data, the selection will be identical. In Table 2.2, however, a feature set
computed from images of a practical visual inspection problem is subject
to both the parametric and the nonparametric first-order feature selection
scheme.
For the regarded nonparametric example data set only the nonparamet-
ric measure provides the a priori known correct solution. Summarizing, first-
order selection schemes are a special case of heuristic approaches to find
solutions to the otherwise NP-complete feature selection problems. Subopti-
mum solutions can be found at very low computational costs. Employment of
the nonparametric measure provides more robustness due to the relaxed dis-
tribution assumption at moderate cost increase, which is dependent on the
sample set size with O(N 2 ). Further, the simple first-order selection could
be employed to weed out variables, which already possess distinct meaning
for themselves, and apply more complex search strategies on the residual
variables.
Higher-Order Selection Techniques. Higher-order correlations or dependen-
cies of features require the computation of the feature merit with regard to
a tuple of other features. In the limit, the effect of a certain feature with re-
gard to all other features has to be considered. As mentioned before, this is a
problem of combinatorial optimization and the best possible solution, i.e., the
global optimum can be found by exhaustive search. Due to the exponential
increase of possible combinations, which grow by 2 M for the binary selection
problem and the number M of features, and the underlying NP-completeness
of the problem only for small to moderate M is an exhaustive search feasible.
Let us assume that computation of the assessment measure q o , which de-
pends on the sample set size N with O(N 2 ), takes one second on a standard
computer. Then an exhaustive search for M = 12 will consume 2 1 2 = 4096
seconds, which amounts approximately to 1 hour and 8 minutes of compu-
tation time. For M = 16, more than 18 hours of computation time will be
required. It is obvious that for larger databases, either for classification or for
 
Search WWH ::




Custom Search