Database Reference
In-Depth Information
tion error. For data analysis, the discovery of all involved variables and the
underlying knowledge are aspired. Feature selection can be understood as
the computation of a constrained matrix A S for a linear mapping with the
following form
c 1 0
···
0
0 c 2 ···
0
...........
00
A S =
,
(2.13)
··· c M
where only diagonal elements can have nonzero values and the c i ∈{
are
binary variables or switch variables determined by a preceding optimization
process. Thus, a linear mapping y = A x is constituted. However, due to the
constrained matrix A and the fact that column vectors with c i =0canbe
entirely omitted, computation can be simplified to y =[ y 1 ,y 2 ,...,y m ] T
0 , 1
}
with
y i = x j
=0andthecor-
responding features x j are just copied to the y i . Feature selection performs
a scaling of feature or coordinate axes by binary variables, i.e., switching off
dimensions and thus defining a subspace that is salient with regard to the
chosen criterion J . As no rotation of the basis vectors is carried out, explicit
interpretability of the result is sustained. However, due to the binary nature
of the selection process, the difference in importance or the impact of individ-
ual features is occluded. A straightforward extension of the binary matrix A S
given for feature selection is feasible, which allows continuous valued rank-
ing of the features. The binary c i
∀c j
=0,i.e., m corresponds to the number of c j
[0 , 1],
which are determined by a preceding optimization process. The limitation
or normalization to [0 , 1] is introduced for the sake of interpretability and
comparison with corresponding feature selection results. This approach com-
monly denoted by feature weighting (FW) allows a continuous scaling of
features or coordinate axes for a i
are replaced by real variables a i
= 0. Those columns with a i =0canbe
omitted, reducing the matrix from M × M to M × m with m ≤ M .Thus,
in addition to the aspired potentially higher achievable discrimination and
better generalization properties, explicit salient information for data analy-
sis purposes and rule weighting is extracted by this method. One particular
method of finding appropriate a i based on a certain cost function J and a
gradient descent technique can be found in [2.21]. Numerous other options
with regard to the chosen J and the optimization strategy, e.g., evolutionary
computation, are feasible [2.45] and are currently being pursued in ongoing
work. Various strategies and methods for feature selection will be discussed
after presentation of relevant cost functions J .
Cost Functions. In the following, from a larger collection of potential cost
or assessment functions summarized in Fig. 2.12, dedicated cost functions
for feature space assessment introduced in prior work, e.g., [2.27] and [2.28]
[2.22], will be briefly presented for the aim of a self-contained presentation.
These serve for discrimination measuring in terms of class regions separability,
Search WWH ::




Custom Search