Ensembles of Least Squares Classifiers with Randomized Kernels - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Tabl e 3 . Variable selection, standardization, and variable weighting decisions

Data

Original

Selected

Selection

Standardize?

Weighting?

set

variables

method

Madelon

500

19

RF

Yes

No

Dexter

20 , 000

500

MI

Yes

By MI

Arcene

10 , 000

None

No

Gisette

5 , 000

307

RF

No

Dorothea

100 , 000

284

RF

No

For each data set, the smallest possible variable set as indicated by a cut-

off point was tried first. If the results were unsatisfactory, the next cut-off

point was searched, and so on, until satisfactory results were obtained. The

maximum number of variables considered was about 500. Full cross-validation

was thus not done over the whole possible range of the number of selected

variables.

Variable set was thereafter fixed to the one that produced the smallest

cross-validation error in the classification experiments, with two exceptions:

Contrary to other data sets, on arcene the error rate using the validation set

did not follow cross-validation error but was the smallest when all variables

were used. Arcene is evidently such a small data set that variable selection

and classifier training both using the 100 training samples, will overfit. The

second exception is dexter, which gave the best results using 500 variables

ranked by maximum mutual information (MI) with the class labels [18].

At this point we also experimented with variable standardization and

weighting variables. Weighting here denotes multiplying variables by their

importance scores given by the RF (or MI). Due to lack of space these exper-

iments are not tabulated, but the decisions are summarized in Table 3.

5.2 Classification Experiments with ELSCs using Random Kernels

An individual RLSC has two parameters that need to be determined by cross-

validation. These are the kernel width σ 2 and the regularization parameter γ .

For a single RLSC, regularization is critical in order not to overfit. The choice

of the parameters needs to be made by cross-validation, and appears to be very

data dependent. This leads to optimization in a two-dimensional parameter

space using cross-validation. As an example, we present this optimization for

the Madelon data set in Fig. 2.

An ensemble of stochastic LSCs is less sensitive to kernel width, does

not require search for the regularization parameter, is not sensitive to the

ensemble size (once it is large enough), and is not very sensitive to the fraction

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home