Information Technology Reference
In-Depth Information
a nonzero risk of keeping an irrelevant variable, or of discarding a relevant
variable. Therefore, the following procedure is used:
Orthogonalize the output and the inputs with respect to the m
1 inputs
selected during the previous m
1 iterations.
In the subspace of dimension q
m , select the input that is most correlated
to the projected output.
Compute the probability that the rank of the probe feature be lower than
or equal to the rank of the feature under examination, i.e., the probability
that the probe feature be more relevant than the input under consider-
ation. The computation of that quantity is explained in the additional
material at the end of the chapter.
If that probability is lower than the risk, chosen by the designer, that a
variable be kept although it is less relevant than the probe feature, keep the
feature under consideration and iterate the procedure; otherwise, discard
the feature and terminate the procedure.
Example 1
In order to illustrate that input selection method, we consider a simulated pro-
cess, described in [Lagarde 1983] and also investigated in [Stoppiglia 1998] 3 ,
[Stoppiglia et al. 2003]. Ten variables are candidate inputs, five of which only
are relevant.
Figure. 2.4 shows the cumulative distribution function of the rank of the
probe feature. It shows that if the five most relevant inputs are selected, the
probability that the rank of the probe feature be smaller than or equal to 5
(i.e., that one of the 5 selected inputs be less relevant than the probe feature)
is smaller than 10%. If 6 inputs are selected, the probability is larger than
10%. Therefore, if the designer is willing to accept a risk of 10%, then the first
5 inputs should be selected: that is exactly the number of relevant inputs. If
the designer is willing to accept a higher risk of keeping an irrelevant input,
20% for instance, then the graph shows that the first 6 features should be kept.
Thus, as in any statistical method, a tradeoff must be performed between the
risk of designing an oversize model and the risk of designing too small a model.
Example 2
In a classification problem, synthetic data in which 2 variables only, out of
240 candidate variables, were relevant [Stoppiglia et al. 2003], and the other
238 variables were just random. The probe feature method was tested on 100
different such databases: it discovered at least 1 true variable in all cases, and
discovered both true features in 74% of the cases. A hypothesis test showed
that, when only one true variable is found, the classification performances of
the model were not significantly different from the performances of models
3 That thesis is available from URL http://www.neurones.espci.fr.
Search WWH ::




Custom Search