Information Technology Reference
In-Depth Information
the predictive accuracy, according to the wrapper approach, or computed from
the information content of the subset itself (e.g., inter-class distance, correlation
or information theoretic measures), according to the filters approach. With fea-
ture selection selected features maintain their original physical interpretation,
which can be useful to understand the physical process that generates patterns.
Moreover it may lead to a reduction of measurement and/or computational cost
(only the selected features will need to be computed) and, if the feature selection
operation selected a smaller subset of sensors than the initial one, this may lead
to a potential saving in the production process. On the other hand, the process
of finding the best feature subset can be computationally very expensive, and it
could be necessary to settle for a suboptimal solution. In order to find the best
subset several search strategies can be adopted [10]:
- Best firsts ranks each feature according to the chosen objective function, and
then keeps only the first M features. Unfortunately, this simple approach
fails quite always because it does not consider features with complementary
information.
- Exhaustive search considers all the possible combinations of feature subsets.
Although this is the only approach that can guarantee to find the optimal
subset, it is rarely used since it is unfeasible even with moderate M and N
values.
- Exponential algorithms evaluate a number of subsets that grows exponen-
tially with the dimensionality of the search space. Among these, the branch
& bound technique is very popular as it guarantees to find the best feature
subset if the monoticity assumption of the objective function is verified. The
monoticity assumption implies that the addition of a new feature always
improves the information content of the subset; unfortunately, in practical
problems, this is rarely verified.
- Sequential algorithms are a greedy strategy that reduces the number of states
to be visited during the search by applying local search. In sequential and
backward feature selection, an individual feature is respectively added or
removed sequentially evaluating the objective function of the new subset
obtained adding or removing that feature (and not of individual features as
in the best first approach). The performance may be improved by means of
sequential floating feature selection with backtracking capabilities.
- Randomized algorithms incorporate randomness into the search procedure in
order to attempt to overcome the computational cost of exponential methods
and the tendency of sequential methods to fall into local minima. Among
these techniques, the random generation plus sequential selection is widely
used; another popular randomized approach is the simulated annealing, that
is based on the annealing process of thermal systems and performs a stochas-
tic search on a single solution. Another very powerful randomized method is
based on genetic algorithms, that are an optimization technique that mim-
ics the evolutionary process of survival of the best; taking inspiration by the
process of natural selection, it performs a global random search on popula-
tion of solutions. In this work a genetic algorithm has been implemented not
 
Search WWH ::




Custom Search