Modeling with Neural Networks: Principles and Model Design Methodology - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

a nonzero risk of keeping an irrelevant variable, or of discarding a relevant

variable. Therefore, the following procedure is used:

•

Orthogonalize the output and the inputs with respect to the m

−

1 inputs

selected during the previous m

−

1 iterations.

•

In the subspace of dimension q

−

m , select the input that is most correlated

to the projected output.

•

Compute the probability that the rank of the probe feature be lower than

or equal to the rank of the feature under examination, i.e., the probability

that the probe feature be more relevant than the input under consider-

ation. The computation of that quantity is explained in the additional

material at the end of the chapter.

•

If that probability is lower than the risk, chosen by the designer, that a

variable be kept although it is less relevant than the probe feature, keep the

feature under consideration and iterate the procedure; otherwise, discard

the feature and terminate the procedure.

Example 1

In order to illustrate that input selection method, we consider a simulated pro-

cess, described in [Lagarde 1983] and also investigated in [Stoppiglia 1998] 3 ,

[Stoppiglia et al. 2003]. Ten variables are candidate inputs, five of which only

are relevant.

Figure. 2.4 shows the cumulative distribution function of the rank of the

probe feature. It shows that if the five most relevant inputs are selected, the

probability that the rank of the probe feature be smaller than or equal to 5

(i.e., that one of the 5 selected inputs be less relevant than the probe feature)

is smaller than 10%. If 6 inputs are selected, the probability is larger than

10%. Therefore, if the designer is willing to accept a risk of 10%, then the first

5 inputs should be selected: that is exactly the number of relevant inputs. If

the designer is willing to accept a higher risk of keeping an irrelevant input,

20% for instance, then the graph shows that the first 6 features should be kept.

Thus, as in any statistical method, a tradeoff must be performed between the

risk of designing an oversize model and the risk of designing too small a model.

Example 2

In a classification problem, synthetic data in which 2 variables only, out of

240 candidate variables, were relevant [Stoppiglia et al. 2003], and the other

238 variables were just random. The probe feature method was tested on 100

different such databases: it discovered at least 1 true variable in all cases, and

discovered both true features in 74% of the cases. A hypothesis test showed

that, when only one true variable is found, the classification performances of

the model were not significantly different from the performances of models

3 That thesis is available from URL http://www.neurones.espci.fr.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home