Databases Reference
In-Depth Information
Tabl e 3 . Variable selection, standardization, and variable weighting decisions
Data
Original
Selected
Selection
Standardize?
Weighting?
set
variables
variables
method
Madelon
500
19
RF
Yes
No
Dexter
20 , 000
500
MI
Yes
By MI
Arcene
10 , 000
10 , 000
None
No
No
Gisette
5 , 000
307
RF
No
No
Dorothea
100 , 000
284
RF
No
No
For each data set, the smallest possible variable set as indicated by a cut-
off point was tried first. If the results were unsatisfactory, the next cut-off
point was searched, and so on, until satisfactory results were obtained. The
maximum number of variables considered was about 500. Full cross-validation
was thus not done over the whole possible range of the number of selected
variables.
Variable set was thereafter fixed to the one that produced the smallest
cross-validation error in the classification experiments, with two exceptions:
Contrary to other data sets, on arcene the error rate using the validation set
did not follow cross-validation error but was the smallest when all variables
were used. Arcene is evidently such a small data set that variable selection
and classifier training both using the 100 training samples, will overfit. The
second exception is dexter, which gave the best results using 500 variables
ranked by maximum mutual information (MI) with the class labels [18].
At this point we also experimented with variable standardization and
weighting variables. Weighting here denotes multiplying variables by their
importance scores given by the RF (or MI). Due to lack of space these exper-
iments are not tabulated, but the decisions are summarized in Table 3.
5.2 Classification Experiments with ELSCs using Random Kernels
An individual RLSC has two parameters that need to be determined by cross-
validation. These are the kernel width σ 2 and the regularization parameter γ .
For a single RLSC, regularization is critical in order not to overfit. The choice
of the parameters needs to be made by cross-validation, and appears to be very
data dependent. This leads to optimization in a two-dimensional parameter
space using cross-validation. As an example, we present this optimization for
the Madelon data set in Fig. 2.
An ensemble of stochastic LSCs is less sensitive to kernel width, does
not require search for the regularization parameter, is not sensitive to the
ensemble size (once it is large enough), and is not very sensitive to the fraction
 
Search WWH ::




Custom Search